The Area under the ROC Curve as a Criterion for Clustering Evaluation

Helena Aidos, Robert P. W. Duin, Ana Fred

2013

Abstract

In the literature, there are several criteria for validation of a clustering partition. Those criteria can be external or internal, depending on whether we use prior information about the true class labels or only the data itself. All these criteria assume a fixed number of clusters k and measure the performance of a clustering algorithm for that k. Instead, we propose a measure that provides the robustness of an algorithm for several values of k, which constructs a ROC curve and measures the area under that curve. We present ROC curves of a few clustering algorithms for several synthetic and real-world datasets and show which clustering algorithms are less sensitive to the choice of the number of clusters, k. We also show that this measure can be used as a validation criterion in a semi-supervised context, and empirical evidence shows that we do not need always all the objects labeled to validate the clustering partition.

References

  1. Aidos, H. and Fred, A. (2011). Hierarchical clustering with high order dissimilarities. In Proc. of Int. Conf. on Mach. Learning and Data Mining, 280-293.
  2. Bolshakova, N. and Azuaje, F. (2003). Cluster validation techniques for gene expression data. Signal Processing, 83:825-833.
  3. Bradley, A. P. (1997). The use of the area under the roc curve in the evaluation of machine learning algorithms. Patt. Recog., 30(7):1145-1159.
  4. Fred, A. and Leita˜o, J. (2003). A new cluster isolation criterion based on dissimilarity increments. IEEE Trans. on Patt. Anal. and Mach. Intelligence, 25(8):944-958.
  5. Halkidi, M., Batistakis, Y., and Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems, 17(2-3):107-145.
  6. Jain, A., Murty, M., and Flynn, P. (1999). Data clustering: a review. ACM Comp. Surveys, 31(3):264-323.
  7. Su, T. and Dy, J. G. (2007). In search of deterministic methods for initializing k-means and gaussian mixture clustering. Intelligent Data Analysis, 11(4):319-338.
  8. Theodoridis, S. and Koutroumbas, K. (2009). Pattern Recognition. Elsevier Academic Press, 4th edition.
Download


Paper Citation


in Harvard Style

Aidos H., P. W. Duin R. and Fred A. (2013). The Area under the ROC Curve as a Criterion for Clustering Evaluation . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 276-280. DOI: 10.5220/0004265502760280


in Bibtex Style

@conference{icpram13,
author={Helena Aidos and Robert P. W. Duin and Ana Fred},
title={The Area under the ROC Curve as a Criterion for Clustering Evaluation},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2013},
pages={276-280},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004265502760280},
isbn={978-989-8565-41-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - The Area under the ROC Curve as a Criterion for Clustering Evaluation
SN - 978-989-8565-41-9
AU - Aidos H.
AU - P. W. Duin R.
AU - Fred A.
PY - 2013
SP - 276
EP - 280
DO - 10.5220/0004265502760280