Table 3: Mean ± standard deviation of best solution of 100 independent runs for the DE-simple matching, DE-IOF, DE-
Eskin, DE-Scaling, and DE-MSM. 
 
DE-Simple 
Matching 
DE-IOF DE-Eskin DE-Scaling DE-MSM  T-test 
Breast Cancer 
0.823201 ± 
00013254 
0.7901874 ± 
0.000231 
0.805437 ± 
0.006119 
0.82289 ± 
000245 
0.8472614 ± 
0.07811E-05 
Significant 
Zoo 
0.90132 ± 
0.0002621 
0.884791± 
0.6119E-04 
0.899645 ± 
0.00332 
0.908892 ± 
0.002583 
0.9435833 ± 
2.52812 E-06 
Significant 
Hepatitis 
0.798517± 
0.003213 
0.769026 ± 
0.00371 
0.734618 ± 
1.842E-04 
0.797582± 
0.0007739 
0.83306326 ± 
7.2235E-05 
Significant 
Heart 
Diseases 
0.762825 ± 
0.000765 
0.7356806 ± 
2.5723E-05 
0.6571352 ± 
0.00422 
0.774329 ± 
0.000113 
0.82840165 ± 
3.77392E-05 
Significant 
Dermatology 
0.85060403 
± 0.000113 
0.7285605 ± 
0.00117 
0.705437 ± 
0.0005632 
0.8505721 ± 
0.00017 
0.86351823 ± 
1.4426 E-04 
Significant 
Credit 
0.9392598 ± 
0.0006234 
0.88369739 
± 0.000921 
0.7401278 ± 
3.48192E-04 
0.940456 ± 
0.000253 
0.91358951 ± 
0.000218 
Significant 
 
The experimental results showed that the MSM 
method achieved statistically significant accuracy in 
80% of the tested datasets. We then move to 
evolutionary setting using DE where similarity 
measures were used to compute distance and update 
centers during the search process. DE showed its 
ability to improve the clustering performance 
compared to the non-evolutionary setting, and DE-
MSM achieved statistically significant accuracy in 
90% of the tested datasets compared to DE-simple 
matching, DE-IOF, DE-Eskin and DE-Scaling. The 
time and space complexity of our proposed method 
is analyzed, and the comparison with the other 
methods confirms the effectiveness of our method. 
For future work, the proposed MSM and/or DE-
MSM methods can be used in a multiobjective data 
clustering framework to deal specifically with mixed 
datasets. Furthermore, the current work can be 
extended to data clustering models with uncertainty. 
REFERENCES 
Ahmad, Dey L., 2007, A k-mean clustering algorithm for 
mixed numeric and categorical data, Data & 
Knowledge Engineering, 63, pp. 503–527. 
Ammar E. Z., Lingras P., 2012, K-modes clustering using 
possibilistic membership, IPMU 2012, Part III, CCIS 
299, pp. 596–605. 
Aranganayagi S.,  Thangavel K., 2009, Improved K-
modes for categorical clustering using weighted 
dissimilarity measure, International Journal of 
Computer,  Electrical, Automation, Control and 
Information Engineering, 3 (3), pp. 729–735. 
Arbelaitz O., Gurrutxaga I., Muguerza J., Rez  J. M., 
Perona I., 2013, An extensive comparative study of 
cluster validity indices, Pattern Recognition (46), pp. 
243–256. 
Asadi S., Rao S., Kishore C., Raju Sh., 2012, Clustering 
the mixed numerical and categorical datasets using 
similarity weight and filter method, International 
Journal of Computer Science, Information Technology 
and Management, 1 (1-2). 
Baghshah M. S., Shouraki S. B., 2009, Semi-supervised 
metric learning using pairwise constraints, 
Proceedings of the Twenty-First International Joint 
Conference on Artificial Intelligence (IJCAI), pp. 
1217–1222. 
Bai L., Lianga J., Dang Ch., Cao F., 2013, A novel fuzzy 
clustering algorithm with between-cluster information 
for categorical data, Fuzzy Sets and Systems, 215, pp. 
55–73.
 
Bai L., Liang J., Sui Ch., Dang Ch., 2013, Fast global k-
means clustering based on local geometrical 
information, Information Sciences, 245, pp. 168-180. 
Bhagat P. M., Halgaonkar P. S., Wadhai V. M., 2013, 
Review of clustering algorithm for categorical data, 
International Journal of Engineering and Advanced 
Technology, 3 (2). 
Blake, C., Merz, C., 1998. UCI  repository machine 
learning datasets. 
Boriah Sh., Chandola V., Kumar V., 2008, Similarity 
measures for categorical data: A comparative 
evaluation. The Eighth SIAM International 
Conference on Data Mining.  pp. 243–254. 
Cha S., 2007, Comprehensive survey on 
distance/similarity measures between probability 
density functions, International journal of 
mathematical models and methods in applied sciences, 
1(4), pp. 300–307. 
Gibson D., Kleinberg J., Raghavan P., 1998, Clustering 
categorical data: An approach based on dynamical 
systems, In 24th International Conference on Very 
Large Databases, pp. 311–322.