Feature Selection by Rank Aggregation and Genetic Algorithms

Waad Bouaguel, Afef Ben Brahim, Mohamed Limam

2013

Abstract

Feature selection consists on selecting relevant features in order to focus the learning search. A simple and efficient setting for feature selection is to rank the features with respect to their relevance. When several rankers are applied to the same data set, their outputs are often different. Combining preference lists from those individual rankers into a single better ranking is known as rank aggregation. In this study, we develop a method to combine a set of ordered lists of feature based on an optimization function and genetic algorithm. We compare the performance of the proposed approach to that of well-known methods. Experiments show that our algorithm improves the prediction accuracy compared to single feature selection algorithms or traditional rank aggregation techniques.

References

  1. Borda, J. C. D. (1781). Memoire sur les elections au scrutin.
  2. Bouckaert, R. R., Frank, E., Hall, M., Kirkby, R., Reutemann, P., Seewald, A., and Scuse, D. (2009). Weka manual (3.7.1).
  3. Carterette, B. (2009). On rank correlation and the distance between rankings. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, SIGIR 7809, pages 436-443, New York, NY, USA. ACM.
  4. Caruana, R., Sa, V. R. D., Guyon, I., and Elisseeff, A. (2003). Benefitting from the variables that variable selection discards. jmlr, 3: 12451264 (this issue. pages 200-3.
  5. Clegg, J., Dawson, J. F., Porter, S. J., and Barley, M. H. (2009). A Genetic Algorithm for Solving Combinatorial Problems and the Effects of Experimental Error - Applied to Optimizing Catalytic Materials. QSAR & Combinatorial Science, 28(9):1010-1020.
  6. DeConde, R., Hawley, S., Falcon, S., Clegg, N., Knudsen, B., and Etzioni, R. (2006). Combining results of microarray experiments: A rank aggregation approach. Statistical Applications in Genetics Molecular Biology, 5(1):1-17.
  7. Dinu, L. P. and Manea, F. (2006). An efficient approach for the rank aggregation problem. Theor. Comput. Sci., 359(1):455-461.
  8. Dwork, C., Kumar, R., Naor, M., and Sivakumar, D. (2001). Rank aggregation methods for the web. pages 613- 622.
  9. Ferri, C., Hernández-Orallo, J., and Modroiu, R. (2009). An experimental comparison of performance measures for classification. Pattern Recognition Letters, 30(1):27-38.
  10. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., and Bloomfield, C. D. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531-537.
  11. Guyon, I. and Elisseff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157-1182.
  12. Hall, M. A. (2000). Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 359-366. Morgan Kaufmann.
  13. Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning. Springer series in statistics. Springer New York Inc.
  14. Holland, J. H. (1992). Adaptation in natural and artificial systems. MIT Press, Cambridge, MA, USA.
  15. Kira, K. and Rendell, L. (1992). A practical approach to feature selection. In Sleeman, D. and Edwards, P., editors, International Conference on Machine Learning, pages 368-377.
  16. Kohavi, R. and John, G. H. (1997). Wrappers for Feature Subset Selection. Artificial Intelligence, 97:273-324.
  17. Kumar, R. and Vassilvitskii, S. (2010). Generalized distances between rankings. In Proceedings of the 19th international conference on World wide web, WWW 7810, pages 571-580, New York, NY, USA. ACM.
  18. Okun, O. (2011). Feature Selection and Ensemble Methods for Bioinformatics: Algorithmic Classification and Implementations.
  19. Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann Publishers Inc.
  20. Vafaie, H. and Imam, I. (1994). Feature Selection Methods: Genetic Algorithms vs. Greedy-like Search. Manuscript.
  21. Weston, J., Elisseeff, A., Schlkopf, B., and Kaelbling, P. (2003). Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research, 3:1439-1461.
  22. Young, H. P. (1990). Condorcet's theory of voting. Mathmatiques et Sciences Humaines, 111:45-59.
  23. Young, H. P. and Levenglick, A. (1978”,). A consistent extension of Condorcet's election principle. SIAM Journal on Applied Mathematics, 35(2):285-300.
Download


Paper Citation


in Harvard Style

Bouaguel W., Ben Brahim A. and Limam M. (2013). Feature Selection by Rank Aggregation and Genetic Algorithms . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013) ISBN 978-989-8565-75-4, pages 74-81. DOI: 10.5220/0004518700740081


in Bibtex Style

@conference{kdir13,
author={Waad Bouaguel and Afef Ben Brahim and Mohamed Limam},
title={Feature Selection by Rank Aggregation and Genetic Algorithms},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)},
year={2013},
pages={74-81},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004518700740081},
isbn={978-989-8565-75-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)
TI - Feature Selection by Rank Aggregation and Genetic Algorithms
SN - 978-989-8565-75-4
AU - Bouaguel W.
AU - Ben Brahim A.
AU - Limam M.
PY - 2013
SP - 74
EP - 81
DO - 10.5220/0004518700740081