Unsupervised Consensus Functions Applied to Ensemble Biclustering

Blaise Hanczar, Mohamed Nadif

2014

Abstract

The ensemble methods are very popular and can improve significantly the performance of classification and clustering algorithms. Their principle is to generate a set of different models, then aggregate them into only one. Recent works have shown that this approach can also be useful in biclustering problems.The crucial step of this approach is the consensus functions that compute the aggregation of the biclusters. We identify the main consensus functions commonly used in the clustering ensemble and show how to extend them in the biclustering context. We evaluate and analyze the performances of these consensus functions on several experiments based on both artificial and real data.

References

  1. Breiman, L. (1996). Bagging predictors. Machine Learning, 24:123-140.
  2. Breiman, L. (2001). Random Forests. Machine Learning, 45:5-32.
  3. Busygin, S., Prokopyev, O., and Pardalos, P. (2008). Biclustering in data mining. Computers and Operations Research, 35(9):2964-2987.
  4. Cheng, Y. and Church, G. M. (2000). Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol, 8:93-103.
  5. De Smet, R. and Marchal, K. (2011). An ensemble biclustering approach for querying gene expression compendia with experimental lists. Bioinformatics, 27(14):1948-1956.
  6. Dempster, A., Laird, N., and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1-38.
  7. Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 7801, pages 269-274.
  8. Diaz-Uriarte, R. and Alvarez de Andres, S. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7(3).
  9. Dietterich, T. G. (2000). Ensemble methods in machine learning. Lecture Notes in Computer Science, 1857:1- 15.
  10. Dudoit, S. and Fridlyand, J. (2003). Bagging to Improve the Accuracy of a Clustering Procedure. Bioinformatics, 19(9):1090-1099.
  11. Erten, C. and Sözdinler, M. (2010). Improving performances of suboptimal greedy iterative biclustering heuristics via localization. Bioinformatics, 26:2594- 2600.
  12. Fern, X. Z. and Brodley, C. E. (2004). Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the twenty-first international conference on Machine learning, ICML 7804, pages 36-.
  13. Frossyniotis, D., Likas, A., and Stafylopatis, A. (2004). A clustering method based on boosting. Pattern Recognition Letters, 25:641-654.
  14. Govaert, G. (1995). Simultaneous clustering of rows and columns. Control and Cybernetics, 24(4):437-458.
  15. Govaert, G. and Nadif, M. (2003). Clustering with block mixture models. Pattern Recognition, 36:463-473.
  16. Hanczar, B. and Nadif, M. (2010). Bagging for biclustering: Application to microarray data. In European Conference on Machine Learning, volume 1, pages 490-505.
  17. Hanczar, B. and Nadif, M. (2012). Ensemble methods for biclustering tasks. Pattern Recognition, 45(11):3938- 3949.
  18. Hartigan, J. A. (1972). Direct clustering of a data matrix. Journal of the American Statistical Association, 67(337):123-129.
  19. Lazzeroni, L. and Owen, A. (2000). Plaid models for gene expression data. Technical report, Stanford University.
  20. Maclin, R. (1997). An empirical evaluation of bagging and boosting. In In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 546-551. AAAI Press.
  21. Madeira, S. C. and Oliveira, A. L. (2004). Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1):24-45.
  22. Papadimitriou, C. H. and Steiglitz, K. (1982). Combinatorial optimization: algorithms and complexity. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
  23. Reichardt, J. and Bornholdt, S. (2006). Statistical mechanics of community detection. Phys. Rev. E, 74:016110.
  24. Schapire, R. (2003). The boosting approach to machine learning: An overview. in Nonlinear Estimation and Classification, Springer.
  25. Strehl, A. and Ghosh, J. (2002). Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583-617.
  26. Topchy, A., Jain, A. K., and Punch, W. (2004a). A mixture model of clustering ensembles. In Proc. SIAM Intl. Conf. on Data Mining.
  27. Topchy, A. P., Law, M. H. C., Jain, A. K., and Fred, A. L. (2004b). Analysis of consensus partition in cluster ensemble. In Fourth IEEE International Conference on Data Mining., pages 225-232.
  28. Turner, H., Bailey, T., and Krzanowski, W. (2005). Improved biclustering of microarray data demonstrated through systematic performance tests. Computational Statistics & Data Analysis, 48(2):235-254.
  29. van der Laan, M., Pollard, K., and Bryan, J. (2003). A new partitioning around medoids algorithm. Journal of Statistical Computation and Simulation, 73(8):575- 584.
Download


Paper Citation


in Harvard Style

Hanczar B. and Nadif M. (2014). Unsupervised Consensus Functions Applied to Ensemble Biclustering . In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-018-5, pages 30-39. DOI: 10.5220/0004789800300039


in Bibtex Style

@conference{icpram14,
author={Blaise Hanczar and Mohamed Nadif},
title={Unsupervised Consensus Functions Applied to Ensemble Biclustering},
booktitle={Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2014},
pages={30-39},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004789800300039},
isbn={978-989-758-018-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Unsupervised Consensus Functions Applied to Ensemble Biclustering
SN - 978-989-758-018-5
AU - Hanczar B.
AU - Nadif M.
PY - 2014
SP - 30
EP - 39
DO - 10.5220/0004789800300039