Probabilistic Evidence Accumulation for Clustering Ensembles

André Lourenço, Samuel Rota Bulò, Nicola Rebagliati, Ana Fred, Mário Figueiredo, Marcello Pelillo

2013

Abstract

Ensemble clustering methods derive a consensus partition of a set of objects starting from the results of a collection of base clustering algorithms forming the ensemble. Each partition in the ensemble provides a set of pairwise observations of the co-occurrence of objects in a same cluster. The evidence accumulation clustering paradigm uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix, which is fed to a pairwise similarity clustering algorithm to obtain a final consensus clustering. The advantage of this solution is the avoidance of the label correspondence problem, which affects other ensemble clustering schemes. In this paper we derive a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix. We introduce a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters, which are in turn estimated using a maximum likelihood approach. Additionally, we propose a novel algorithm to carry out the parameter estimation with convergence guarantees towards a local solution. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.

References

  1. Ayad, H. and Kamel, M. S. (2008). Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans. Pattern Anal. Mach. Intell., 30(1):160-173.
  2. Baeza-Yates, R. A. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
  3. Bezdek, J. and Hathaway, R. (2002). Vat: a tool for visual assessment of (cluster) tendency. In Neural Networks, 2002. IJCNN 7802. Proceedings of the 2002 International Joint Conference on, volume 3, pages 2225 - 2230.
  4. Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press, first edition edition.
  5. Dimitriadou, E., Weingessel, A., and Hornik, K. (2002). A combination scheme for fuzzy clustering. In AFSS'02, pages 332-338.
  6. Fern, X. Z. and Brodley, C. E. (2004). Solving cluster ensemble problems by bipartite graph partitioning. In Proc ICML 7804.
  7. Fred, A. (2001). Finding consistent clusters in data partitions. In Kittler, J. and Roli, F., editors, Multiple Classifier Systems, volume 2096, pages 309-318. Springer.
  8. Fred, A. and Jain, A. (2002). Data clustering using evidence accumulation. In Proc. of the 16th Int'l Conference on Pattern Recognition, pages 276-280.
  9. Fred, A. and Jain, A. (2005). Combining multiple clustering using evidence accumulation. IEEE Trans Pattern Analysis and Machine Intelligence, 27(6):835-850.
  10. Ghosh, J. and Acharya, A. (2011). Cluster ensembles. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 1(4):305-315.
  11. Griffiths, T. L. and Steyvers, M. (2004). Finding scientific topics. Proc Natl Acad Sci U S A, 101 Suppl 1:5228- 5235.
  12. Jain, A. K. and Dubes, R. (1988). Algorithms for Clustering Data. Prentice Hall.
  13. Kachurovskii, I. R. (1960). On monotone operators and convex functionals. Uspekhi Mat. Nauk, 15(4):213- 215.
  14. Lourenc¸o, A., Fred, A., and Figueiredo, M. (2011). A generative dyadic aspect model for evidence accumulation clustering. In Proc. 1st Int. Conf. Similaritybased pattern recognition, SIMBAD'11, pages 104- 116, Berlin, Heidelberg. Springer-Verlag.
  15. Lourenc¸o, A., Fred, A., and Jain, A. K. (2010). On the scalability of evidence accumulation clustering. In 20th International Conference on Pattern Recognition (ICPR), pages 782 -785, Istanbul Turkey.
  16. Luenberger, D. G. and Ye, Y. (2008). Linear and Nonlinear Programming. Springer, third edition edition.
  17. Manning, C. D., Raghavan, P., and Schtze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA.
  18. Meila, M. (2003). Comparing clusterings by the variation of information. In Springer, editor, Proc. of the Sixteenth Annual Conf. of Computational Learning Theory (COLT).
  19. Ng, A. Y., Jordan, M. I., and Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In NIPS, pages 849-856. MIT Press.
  20. Rota Bulò, S., Lourenc¸o, A., Fred, A., and Pelillo, M. (2010). Pairwise probabilistic clustering using evidence accumulation. In Proc. 2010 Int. Conf. on Structural, Syntactic, and Statistical Pattern Recognition, SSPR&SPR'10, pages 395-404.
  21. Sculley, D. (2010). Web-scale k-means clustering. In Proceedings of the 19th international conference on World wide web, WWW 7810, pages 1177-1178, New York, NY, USA. ACM.
  22. Steyvers, M. and Griffiths, T. (2007). Probabilistic topic models, chapter Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum.
  23. Strehl, A. and Ghosh, J. (2002). Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. of Machine Learning Research 3.
  24. Topchy, A., Jain, A., and Punch, W. (2004). A mixture model of clustering ensembles. In Proc. of the SIAM Conf. on Data Mining.
  25. Topchy, A., Jain, A. K., and Punch, W. (2005). Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell., 27(12):1866- 1881.
  26. Wang, H., Shan, H., and Banerjee, A. (2009). Bayesian cluster ensembles. In 9th SIAM Int. Conf. on Data Mining.
  27. Wang, P., Domeniconi, C., and Laskey, K. B. (2010). Nonparametric bayesian clustering ensembles. In ECML PKDD'10, pages 435-450.
Download


Paper Citation


in Harvard Style

Lourenço A., Rota Bulò S., Rebagliati N., Fred A., Figueiredo M. and Pelillo M. (2013). Probabilistic Evidence Accumulation for Clustering Ensembles . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 58-67. DOI: 10.5220/0004267900580067


in Bibtex Style

@conference{icpram13,
author={André Lourenço and Samuel Rota Bulò and Nicola Rebagliati and Ana Fred and Mário Figueiredo and Marcello Pelillo},
title={Probabilistic Evidence Accumulation for Clustering Ensembles},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2013},
pages={58-67},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004267900580067},
isbn={978-989-8565-41-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Probabilistic Evidence Accumulation for Clustering Ensembles
SN - 978-989-8565-41-9
AU - Lourenço A.
AU - Rota Bulò S.
AU - Rebagliati N.
AU - Fred A.
AU - Figueiredo M.
AU - Pelillo M.
PY - 2013
SP - 58
EP - 67
DO - 10.5220/0004267900580067