Authors:
Doaa S. Ali
;
Ayman Ghoneim
and
Mohamed Saleh
Affiliation:
Cairo Univeristy, Egypt
Keyword(s):
Multiobjective Data Clustering, Categorical Datasets, K-modes Clustering Algorithm, Entropy.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Business Analytics
;
Cardiovascular Technologies
;
Computing and Telecommunications in Cardiology
;
Data Engineering
;
Data Mining and Business Analytics
;
Decision Support Systems
;
Decision Support Systems, Remote Data Analysis
;
Health Engineering and Technology Applications
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mathematical Modeling
;
Methodologies and Technologies
;
Operational Research
;
Optimization
;
Symbolic Systems
Abstract:
Data clustering is an important unsupervised technique in data mining which aims to extract the natural
partitions in a dataset without a priori class information. Unfortunately, every clustering model is very
sensitive to the set of randomly initialized centers, since such initial clusters directly influence the formation
of final clusters. Thus, determining the initial cluster centers is an important issue in clustering models.
Previous work has shown that using multiple clustering validity indices in a multiobjective clustering model
(e.g., MODEK-Modes model) yields more accurate results than using a single validity index. In this study,
we enhance the performance of MODEK-Modes model by introducing two new initialization methods. The
two proposed methods are the K-Modes initialization method and the entropy initialization method. The two
proposed methods are tested using ten benchmark real life datasets obtained from the UCI Machine
Learning Repository. Experimental results show
that the two initialization methods achieve significant
improvement in the clustering performance compared to other existing initialization methods.
(More)