Analysis of Centroid Cluster in X-Means Clustering in Data

Classification: Power Absorb Oxygen

Sardo Pardingotan Sipayung

, Poltak Sihombing

and Sutarman

Department of Computer Science, Faculty of Computer Science and Information Technology, Universitas Sumatera Utara,

Medan, Indonesia

Department of Information Technology, Faculty of Computer Science and Information Technology, Universitas Sumatera

Utara

Keywords: Oxygen, Cluster, Centroid, X-Mean.

Abstract: On gardens city of Medan, there are type different trees. On every tree have power absorbency oxygen and

work issue oxygen every day. It will be grouping the tree data that issued oxygen with use X-Means method

on Clustering algorithm. Then in research, an analysis to centroid that is point data center inside process

grouping, then need t o a n analysis centroid in determining gift value early to process the beginning

of clustering. So that data was used as point center cluster on process X-Means clustering algorithm.

1 INTRODUCTION

Centroid cluster selected in a manner random through

a number of K-cluster. Algorithm share the data

provided to in K-cluster, respectively have

membership cluster own and set every data point to

center mass closest. Then compile reset it centroid use

association cluster when this and if grouping not

fused, the process will be repeat to several times. X-

means clustering is variation from K-means

clustering treat allocation cluster with try partition

over and over and keep separation optimal results,

arrive some criteria achieved. X-mean cluster with do

grouping intrinsic in a data set that is not labeled.

Giving fast way and efficient for grouping data that

doesn't structure, usage concurrency with speed up

process model and construction use.

Point center cluster or centroid is a point early

start grouping in the cluster on algorithm K-Means.

Data grouping is done with calculating distance

closest with point center initial cluster as point central

information every group or cluster. However on its

application, determination point center initial cluster

this is what become weakness from algorithm K-

Means. This caused because not there is an approach

used to choose and determine point center cluster.

Point center cluster selected in a manner just any or

random from a set of data. The results clustering from

algorithm K-Means often less optimal and not

maximum in every experiment conducted. By

because that, can say it that well bad the results

clustering, very depend on point center cluster or

centroid beginning (Baswade, 2013).

Some researchers have looked for the problem of

k-means clustering and some have taken many

approaches to accelerate k-means. But several

methods have been introduced to scalability and

reduce the time complexity of the k-means algorithm.

(Pelleg, 2000) has proposed a method called X-

means. The purpose of this method is to divide

several centroids into two to match the data reached.

The X-means algorithm has proven to be more

efficient than k-means. This method does not have

any disadvantages, based on the BIC (Bayesian

Information Criterion) on the separation of many

centroid selections when the data is not completely

spherical.

2 RESEARCH METHODS

2.1 Clustering

Clustering is method classify or partition data inside

a dataset. On basically clustering are something

method for looking for and group data that has

similarity characteristic (similarity) between one data

with other data (Bhusare, 2014). The Cluster is a

group data objects that have similarity one each other

Sipayung, S., Sihombing, P. and Sutarman, .

Analysis of Centroid Cluster in X-Means Clustering in Data Classiﬁcation: Power Absorb Oxygen.

DOI: 10.5220/0008547601350137

In Proceedings of the International Conference on Natural Resources and Technology (ICONART 2019), pages 135-137

ISBN: 978-989-758-404-6

135

inside of cluster and who doesn't have similarity to

objects that are different cluster. Object will grouped

to in one or more cluster so objects that are located in

one cluster will have a high similarity between one

with others. The objects will be grouped based on

principle maximizing similarity object on cluster and

maximizing inequality on a different clusters.

Similarity object usually obtained from values

attribute that explains data object, whereas data

objects usually represented as a point in room

multidimensional. Characteristics from every cluster

not determined before, however pictured from data

similarity grouped in inside it.

2.2 X-Means Clustering

X-means clustering is used for completely wrong the

other weakness main from K-means clustering, that is

the need knowledge previous about a number of

clusters (K). In method this, value in fact from K

estimated in something that isn't watched over way

and only based on that data set alone.

Figure 1 Steps General In X-Means Grouping.

max

and K

min

as limit on and under for possible

values from K. Step first X-

Means

grouping,

knowing that when this is K = K

min

, K-means find

structure early and centroid. In step then, every

cluster in the expected structure treated as parent

cluster, which can divide to be two group children.

Based on some criteria, which will explain in part

next, we rate structure parents and children. Score

help decide is person old is representations well for

sample data or children. Cluster gives more

distribution accurate on sample. As a result, a good

parent will be replaced by centroid children, or

algorithm will permanent person old centroid and

leave children. Then, the new structure will be built

or updated based on selection person old or children.

Procedure this will next for all clusters inside

structure early to when this estimated number of

clusters to be bigger from max.

K algorithm

convergent to structure the best. Algorithm this can

too slow because need run reset it K

means

for every

separation cluster. For resolve problem, apply kd-tree

of data set that is natural reduce total demand

neighbor closest for K-means (Pelleg, 2000).

3 IDENTIFICATION PROBLEMS

The centroid is point data center inside process

grouping, then need to an analysis centroid in

determine gift value early in process the beginning of

clustering. So that used as point center cluster on

process X-Means clustering algorithm.

4 RESULT AND DISCUSSION

The purpose of this study is to determine the center

point of the cluster or centroid, measure the

performance of the X-Means algorithm with range

cluster parameters and compare the results of the X-

Means algorithm accuracy with the k-means

algorithm and by measuring the distance between

centroids for fast and efficient ways to group

unstructured data, and to speed up the process of

construction of the model and divide some centroids

into two to match the data achieved.

Results a reason about algorithm Clustering on

X-Means method uses Power dataset Absorb

Oxygen on Tree could be seen as the following of

Cluster used:

Cluster 0: 104 items

Cluster 1: 3 items

Total number of items: 107

On the results analysis centroid, can be seen on a

table the following:

Table 1: Analysis Centroid.

Attribute

Cluster_0

Cluster_1

Name of Tree

0.0

kg / year

-0.16

5.7

ton / year

-0.17

5.7

Table 2: Analysis Performance Vector.

centroid distance

-0.104

centroid distance cluster 0

-0,013

centroid distance cluster 1

-3,237

Davies Bouldin

-0.206

ICONART 2019 - International Conference on Natural Resources and Technology

136

Figure 2: Cluster Graph.

5 CONCLUSIONS

Based on trial and analysis that has been done,

obtained the results conclusion researcher as the

following:

From measurements X-means accuracy has

obtained the results namely: structure clustering

obtained is nature medium.

Results measurement performance from the

cluster, there are different distances between

clusters 1 and 0.

More and more the size of the dataset used, then

more and more the greater the value obtained

but not change a number of clusters produced.

X-Means proved to have level Good accuracy

compared with K-means with classifying type

tree that has power absorbency oxygen.

REFERENCES

Baswade, A. M., Nalwade, P. S. 2013. Selection of Initial

Centroids for K-Means Algorithm. International

Journal of Computer Science and Mobile Computing

(IJCSM) 2 (7): 161-164.

Bhusare, B. B., Bansode, S. M. 2014. Centroids

initialization for K-means clustering using improved

pillar algorithm. Journal of Advanced Research in

Computer Engineering & Technology (IJARCET), 3(4),

1317-1322.

Pelleg, D., A. Moore. 2000. X-means: Extending K- effects

with Efficient Estimation of The Number of Clusters.

In International Conference on Machine Learning,

Palo Alto, CA. 1, 727-734.

Krawczyk, B., Woźniak, M. 2015. Pruning Ensembles of

One-Class Classifiers with X-means Clustering.

In Asian Conference on Intelligent Information and

Database Systems. 484-493.

Maimon, O., Last, M. 2001. Knowledge Discovery and

Data Mining. Springer US. United States, 1

edition.

Poteras , C. M., Mihӑescu, M. C., Mocanu, M. 2014. An

optimized version of the kilometer clustering

algorithm. In Federated Conference on Computer

Science and Information Systems. 695–699.

Rose, JD 2016. An efficient association of rule based

hierarchical algorithm for text clustering. International

Journal of Advanced Engineering Technology 7 (4):

751 - 753.

Turban, E., E. Jay., Aronson., Liang Ting- Peng. 2005.

Decision Support System and Intelligent System. Andi

Offset

Analysis of Centroid Cluster in X-Means Clustering in Data Classiﬁcation: Power Absorb Oxygen

137