Empirical Evaluation of Distance Measures for Nearest Point with Indexing Ratio Clustering Algorithm

Raneem Qaddoura, Hossam Faris, Ibrahim Aljarah, J. Merelo, Pedro Castillo

Abstract

Selecting the proper distance measure is very challenging for most clustering algorithms. Some common distance measures include Manhattan (City-block), Euclidean, Minkowski, and Chebyshev. The so called Nearest Point with Indexing Ratio (NPIR) is a recent clustering algorithm, which tries to overcome the limitations of other algorithms by identifying arbitrary shapes of clusters, non-spherical distribution of points, and shapes with different densities. It does so by iteratively utilizing the nearest neighbors search technique to find different clusters. The current implementation of the algorithm considers the Euclidean distance measure, which is used for the experiments presented in the original paper of the algorithm. In this paper, the impact of the four common distance measures on NPIR clustering algorithm is investigated. The performance of NPIR algorithm in accordance to purity and entropy measures is investigated on nine data sets. The comparative study demonstrates that the NPIR generates better results when Manhattan distance measure is used compared to the other distance measures for the studied high dimensional data sets in terms of purity and entropy.

Download


Paper Citation