Applying a Hybrid Targeted Estimation of Distribution Algorithm to Feature Selection Problems

Geoffrey Neumann, David Cairns

2013

Abstract

This paper presents the results of applying the hybrid Targeted Estimation of Distribution Algorithm (TEDA) to feature selection problems with 500 to 20,000 features. TEDA uses parent fitness and features to provide a target for the number of features required for classification and can quickly drive down the size of the selected feature set even when the initial feature set is relatively large. TEDA is a hybrid algorithm that transitions between the selection and crossover approaches of a Genetic Algorithm (GA) and those of an Estimation of Distribution Algorithm (EDA) based on the reliability of the estimated probability distribution.Targeting the number of features in this way has two key benefits. Firstly, it enables TEDA to efficiently find good solutions for cases with very low signal to noise ratios where the majority of available features are not associated with the given classification task. Secondly, due to the tendency of TEDA to select the smallest and most promising initial feature set, it builds compact classifiers that are able to evaluate populations more quickly than other approaches.

References

  1. Cantu-Paz, E. (2002). Feature subset selection by estimation of distribution algorithms. In Proc. of Genetic and Evolutionary Computation Conf. MIT Press.
  2. Chang, C. C. and Lin, C. J. (2011). Libsvm: a library for support vector machines. ACM Trans. on Intelligent Systems and Technology (TIST), 2(3):27.
  3. Dash, M., Liu, H., and Manoranjan (1997). Feature selection for classification. Intelligent data analysis, 1:131-156.
  4. Frank, A. and Asuncion, A. (2010). UCI machine learning repository.
  5. Godley, P., Cairns, D., Cowie, J., and McCall, J. (2008). Fitness directed intervention crossover approaches applied to bio-scheduling problems. In Symp. on Computational Intelligence in Bioinformatics and Computational Biology, pages 120-127. IEEE.
  6. Guyon, I., Gunn, S., Ben-Hur, A., and Dror, G. (2004). Result analysis of the nips 2003 feature selection challenge. Advances in Neural Information Processing Systems, 17:545-552.
  7. Inza, I., Larranaga, P., Etxeberria, R., and Sierra, B. (2000). Feature subset selection by bayesian networks based on optimization. Artificial Intelligence, 123(1- 2):157-184.
  8. Inza, I., Larranaga, P., and Sierra, B. (2001). Feature subset selection by bayesian networks: a comparison with genetic and sequential algorithms. Int. Journ. of Approximate Reasoning, 27(2):143-164.
  9. Keller, J., Gray, M., and Givens, J. (1985). A fuzzy knearest neighbor algorithm. IEEE Trans. on Systems, Man and Cybernetics, 4:580-585.
  10. Lai, C., Reinders, M., and Wessels, L. (2006). Random subspace method for multivariate feature selection. Pattern Recognition Letters, 27(10):1067-1076.
  11. Larranaga, P. and Lozano., J. A. (2002). Estimation of distribution algorithms: A new tool for evolutionary computation, volume 2. Springer.
  12. Muhlenbein, H. and Paass, G. (1996). PPSN, volume IV, chapter From recombination of genes to the estimation of distributions: I. binary parameters., pages 178- 187. Springer, Berlin.
  13. Neumann, G. and Cairns, D. (2012a). Targeted eda adapted for a routing problem with variable length chromosomes. In IEEE Congress on Evolutionary Computation (CEC), pages 220-225.
  14. Neumann, G. K. and Cairns, D. E. (2012b). Introducing intervention targeting into estimation of distribution algorithms. In Proc. of the 27th ACM Symp. on Applied Computing, pages 334-341.
  15. Pena, J., V. Robles, V., Larranaga, P., Herves, V., Rosales, F., and Perez, M. (2004). Ga-eda: Hybrid evolutionary algorithm using genetic and estimation of distribution algorithms. Innovations in Applied Artificial Intelligence, pages 361-371.
  16. Pudil, P., J., Novovicova, and Kittler, J. (1994). Floating search methods in feature selection. Pattern recognition letters, 15(11):1119-1125.
  17. Saeys, Y., Degroeve, S., Aeyels, D., de Peer, Y. V., and Rouz, P. (2003). Fast feature selection using a simple estimation of distribution algorithm: a case study on splice site prediction. Bioinformatics, 19(suppl 2):179-188.
  18. Siegel, S. and Jr., N. J. C. (1988). Nonparametric Statistics for The Behavioral Sciences. McGraw-Hill, NY.
Download


Paper Citation


in Harvard Style

Neumann G. and Cairns D. (2013). Applying a Hybrid Targeted Estimation of Distribution Algorithm to Feature Selection Problems . In Proceedings of the 5th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2013) ISBN 978-989-8565-77-8, pages 136-143. DOI: 10.5220/0004553301360143


in Bibtex Style

@conference{ecta13,
author={Geoffrey Neumann and David Cairns},
title={Applying a Hybrid Targeted Estimation of Distribution Algorithm to Feature Selection Problems},
booktitle={Proceedings of the 5th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2013)},
year={2013},
pages={136-143},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004553301360143},
isbn={978-989-8565-77-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2013)
TI - Applying a Hybrid Targeted Estimation of Distribution Algorithm to Feature Selection Problems
SN - 978-989-8565-77-8
AU - Neumann G.
AU - Cairns D.
PY - 2013
SP - 136
EP - 143
DO - 10.5220/0004553301360143