Doaa S. Ali
Cairo Univeristy, Egypt
Multiobjective Data Clustering, Categorical Datasets, K-modes Clustering Algorithm, Entropy.
Computing and Telecommunications in Cardiology
Data Mining and Business Analytics
Decision Support Systems
Decision Support Systems, Remote Data Analysis
Health Engineering and Technology Applications
Knowledge Discovery and Information Retrieval
Methodologies and Technologies
Data clustering is an important unsupervised technique in data mining which aims to extract the natural
partitions in a dataset without a priori class information. Unfortunately, every clustering model is very
sensitive to the set of randomly initialized centers, since such initial clusters directly influence the formation
of final clusters. Thus, determining the initial cluster centers is an important issue in clustering models.
Previous work has shown that using multiple clustering validity indices in a multiobjective clustering model
(e.g., MODEK-Modes model) yields more accurate results than using a single validity index. In this study,
we enhance the performance of MODEK-Modes model by introducing two new initialization methods. The
two proposed methods are the K-Modes initialization method and the entropy initialization method. The two
proposed methods are tested using ten benchmark real life datasets obtained from the UCI Machine
Learning Repository. Experimental results show t
hat the two initialization methods achieve significant
improvement in the clustering performance compared to other existing initialization methods.