Initializing k-means Clustering

Christian Borgelt, Olha Yarikova

2020

Abstract

The quality of clustering results obtained with the k-means algorithm depends heavily on the initialization of the cluster centers. Simply sampling centers uniformly at random from the data points usually yields fairly poor and unstable results. Hence several alternatives have been suggested in the past, among which Maximin (Hathaway et al., 2006) and k-means++ (Arthur and Vassilvitskii, 2007) are best known and most widely used. In this paper we explore modifications of these methods that deal with cases, in which the original methods still yield suboptimal choices of the initial cluster centers. Furthermore we present efficient implementations of our new methods.

Download


Paper Citation


in Harvard Style

Borgelt C. and Yarikova O. (2020). Initializing k-means Clustering.In Proceedings of the 9th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-440-4, pages 260-267. DOI: 10.5220/0009872702600267


in Bibtex Style

@conference{data20,
author={Christian Borgelt and Olha Yarikova},
title={Initializing k-means Clustering},
booktitle={Proceedings of the 9th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},
year={2020},
pages={260-267},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009872702600267},
isbn={978-989-758-440-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 9th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - Initializing k-means Clustering
SN - 978-989-758-440-4
AU - Borgelt C.
AU - Yarikova O.
PY - 2020
SP - 260
EP - 267
DO - 10.5220/0009872702600267