Authors:
Helena Aidos
1
;
Robert P. W. Duin
2
and
Ana Fred
1
Affiliations:
1
Instituto Superior Técnico, Portugal
;
2
Delft University of Technology, Netherlands
Keyword(s):
Clustering Validity, Robustness, ROC Curve, Area under Curve, Semi-supervised.
Related
Ontology
Subjects/Areas/Topics:
Clustering
;
Pattern Recognition
;
Theory and Methods
Abstract:
In the literature, there are several criteria for validation of a clustering partition. Those criteria can be external or internal, depending on whether we use prior information about the true class labels or only the data itself. All these criteria assume a fixed number of clusters k and measure the performance of a clustering algorithm for that k. Instead, we propose a measure that provides the robustness of an algorithm for several values of k, which constructs a ROC curve and measures the area under that curve. We present ROC curves of a few clustering algorithms for several synthetic and real-world datasets and show which clustering algorithms are less sensitive to the choice of the number of clusters, k. We also show that this measure can be used as a validation criterion in a semi-supervised context, and empirical evidence shows that we do not need always all the objects labeled to validate the clustering partition.