centroids and selecting the data point with the max-
imum distance as the new initial centroid. The ex-
periment involved eight benchmark datasets with five
validity measures and execution time. The results of
this experiment showed that the Pillar algorithm can
optimize the initial centroid selection and improve the
precision of K-Means on all datasets and most of its
validity measures (Barakbah and Kiyoki, 2009a).
The ability of the Pillar technique in determining
initial centroids which is one of the most common
problems due to its random nature attracts the atten-
tion of researchers in implementing it for the needs
of handling and learning unsupervised learning. Ex-
periments carried out may not necessarily have the
same results as the success of previous researchers,
so the author is interested in solving the problem in
this initialization by using the Pillar technique as the
proposed method. In this paper, we hope to improve
the performance of the K-Means algorithm on the
dataset to be processed (Retno et al., 2020) (Putra
et al., 2017).
2 LITERATURE REVIEW
2.1 Unsupervised Learning
The machine learning learning process that is un-
labeled and unsupervised, therefore, does not have
easily categorized data formation patterns, is also
called unsupervised learning. Unsupervised learning
aims to discover hidden patterns in data and is com-
monly used in solving clustering problems (Wahyono,
2020). Unsupervised learning is a category of ma-
chine learning in which the processed dataset is unla-
beled or does not have a predetermined output. There-
fore, it can also be referred to as a process of process-
ing data that is unlabeled and unsupervised. Unsuper-
vised learning can be analogized to a teacher grading
a student’s answers, but the correctness of the answer
depends on how the teacher understands the question
without an answer key. Thus, unsupervised learning
is considered more subjective than supervised learn-
ing. In cases where the dataset is unlabeled and the
implicit relationships need to be discovered, unsuper-
vised learning is very useful. The non-relationship
condition is usually called clustering. Some unsuper-
vised learning algorithms include K-Means, Hierar-
chical Clustering, DBSCAN, Fuzzy C-Means, Self-
Organizing Map, and others (Primartha, 2021).
2.2 Clustering
Clustering is a very important tool in research pro-
cesses for solving various problems in several fields
such as archaeology, psychiatry, engineering, and
medicine. Clusters consist of points that are similar
to each other but different from points in other clus-
ters (Abo-Elnaga and Nasr, 2022). Clustering is also
commonly used in various fields as a tool for analyz-
ing social networks, detecting crimes, and software
engineering, as it helps to identify the pattern of a
process in searching and classifying data that have
characteristics between one data and another, which is
also known as clustering (Putra et al., 2017). Cluster-
ing has the advantage of wide applicability in pattern
recognition, machine learning, image programming,
and statistics. The purpose is to partition a set of data
with similar patterns into different groups (Wang and
Bai, 2016). Clustering analysis is one of the most im-
portant problems in data processing. Identifying simi-
larity groups among data sets has been widely applied
in several applications. The general approach for de-
termining a cluster from a data set is to minimize the
objective function after the number of clusters is de-
termined a priori (Kume and Walker, 2021).
2.3 K-Means Algorithm
Clustering using K-Means is an approach to dividing
data into similar groups and creating clusters. The
advantage of using the K-Means algorithm is that
it can cluster massive data quickly and efficiently.
The initial step of the K-Means algorithm is to de-
termine the initial centroids randomly (Abo-Elnaga
and Nasr, 2022). The K-Means algorithm identifies
clusters by minimizing the clustering error (Wang and
Bai, 2016). With high simplicity, practicality, and ef-
ficiency, the K-Means algorithm has been success-
fully applied in various fields and applications, in-
cluding document clustering, market segmentation,
image segmentation, and feature learning. Generally,
clustering algorithms fall into two categories: hierar-
chical clustering and partition clustering (Liu et al.,
2020).
The K-Means algorithm is capable of clustering
large data, even multi-view data from different ta-
bles or datasets. In clustering data simultaneously on
each table to become multi-view. Clustering a large
amount of data is known as a technique for centroid-
based clustering by representing the number of clus-
ters and then obtaining good groups depending on
the final constant value of the centroid (Retno et al.,
2020).
The clustering process in the K-Means algorithm
Improving Performance of the K-Means Algorithm with the Pillar Technique for Determining Centroids
53