Semi-supervised Clustering with Example Clusters

Celine Vens, Bart Verstrynge, Hendrik Blockeel

2013

Abstract

We consider the following problem: Given a set of data and one or more examples of clusters, find a clustering of the whole data set that is consistent with the given clusters. This is essentially a semi-supervised clustering problem, but different from those that have been studied until now. We argue that it occurs frequently in practice, but despite this, none of the existing methods can handle it well. We present a new method that specifically targets this type of problem. We show that the method works better than standard methods and identify opportunities for further improvement.

References

  1. Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. (2005). Automatic subspace clustering of high dimensional data. Data Mining and Knowledge Discovery, 11(1):5-33.
  2. Bar-Hillel, A., Hertz, T., Shental, N., and Weinshall, D. (2005). Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 6:937-965.
  3. Basu, S., Banerjee, A., and Mooney, R. (2002). Semisupervised clustering by seeding. In Proceedings of 19th International Conference on Machine Learning (ICML-2002.
  4. Bilenko, M., Basu, S., and Mooney, R. (2004). Integrating constraints and metric learning in semi-supervised clustering. In ICML, pages 81-88.
  5. Fisher, D. (1987). Knowledge acquisition via incremental conceptual clustering. Machine learning, 2(2):139- 172.
  6. Frank, A. and Asuncion, A. (2010). UCI machine learning repository.
  7. Grira, N., Crucianu, M., and Boujemaa, N. (2004). Unsupervised and Semi-supervised Clustering: a Brief Survey. A Review of Machine Learning Techniques for Processing Multimedia Content, Report of the MUSCLE European Network of Excellence (FP6).
  8. Mahalanobis, P. C. (1936). On the generalised distance in statistics. In Proceedings National Institute of Science, India, pages 49-55.
  9. Manning, C. D., Raghavan, P., and Schtze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA.
  10. Rand, W. M. (1971). Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, (336):846-850.
  11. Turk, M. A. and Pentland, A. P. (1991). Face recognition using eigenfaces. Proceedings 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 591(1):586-591.
  12. Wagstaff, K. and Cardie, C. (2000). Clustering with instance-level constraints. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 1103-1110.
  13. Wagstaff, K., Cardie, C., Rogers, S., and Schroedl, S. (2001). Constrained K-means clustering with background knowledge. In ICML, pages 577-584. Morgan Kaufmann.
  14. Witten, I., Frank, E., and Hall, M. (2011). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
  15. Xing, E., Ng, A., Jordan, M., and Russell, S. (2002). Distance metric learning, with application to clustering with side-information. In Advances in Neural Information Processing Systems 15, pages 505-512. MIT Press.
  16. Yeung, D. and Chang, H. (2006). Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints. Pattern Recognition, 39(5):1007 - 1010.
Download


Paper Citation


in Harvard Style

Vens C., Verstrynge B. and Blockeel H. (2013). Semi-supervised Clustering with Example Clusters . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013) ISBN 978-989-8565-75-4, pages 45-51. DOI: 10.5220/0004547300450051


in Bibtex Style

@conference{kdir13,
author={Celine Vens and Bart Verstrynge and Hendrik Blockeel},
title={Semi-supervised Clustering with Example Clusters},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)},
year={2013},
pages={45-51},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004547300450051},
isbn={978-989-8565-75-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)
TI - Semi-supervised Clustering with Example Clusters
SN - 978-989-8565-75-4
AU - Vens C.
AU - Verstrynge B.
AU - Blockeel H.
PY - 2013
SP - 45
EP - 51
DO - 10.5220/0004547300450051