A Hierarchical Clustering based Heuristic for Automatic Clustering

Franc¸ois LaPlante

, Nabil Belacel

and Mustapha Kardouchi

Department of Computer Sciences, Universit´e de Moncton, E1A 3E9, Moncton, NB, Canada

National Research Council - Information and Communications Technologies, E1A 7R9, Moncton, NB, Canada

Keywords:

Data-mining, Automatic Clustering, Unsupervised Learning.

Abstract:

Determining an optimal number of clusters and producing reliable results are two challenging and critical

tasks in cluster analysis. We propose a clustering method which produces valid results while automatically

determining an optimal number of clusters. Our method achieves these results without user input pertaining

directly to a number of clusters. The method consists of two main components: splitting and merging. In

the splitting phase, a divisive hierarchical clustering method (based on the DIANA algorithm) is executed and

interrupted by a heuristic function once the partial result is considered to be “adequate”. This partial result,

which is likely to have too many clusters, is then fed into the merging method which merges clusters until

the ﬁnal optimal result is reached. Our method’s effectiveness in clustering various data sets is demonstrated,

including its ability to produce valid results on data sets presenting nested or interlocking shapes. The method

is compared with cluster validity analysis to other methods to which a known optimal number of clusters is

provided and to other automatic clustering methods. Depending on the particularities of the data set used, our

method has produced results which are roughly equivalent or better than those of the compared methods.

1 INTRODUCTION

Data clustering, also known as cluster analysis, seg-

mentation analysis, taxonomy analysis (Gan, 2011),

is a form of unsupervised classiﬁcation of data points

into groups called clusters. Data points in a same clus-

ter should be a similar to each other as possible and

data points in different clusters should be as dissimilar

as possible (Jain et al., 1999).

One common problem across many clustering

methods is determining the correct (optimal) number

of clusters. One prevalent method to determine an op-

timal number of clusters involves the use of validity

indices. Cluster validity indices are a value computed

based on a clustering result and represent a relative

quality of this clustering. Often, a clustering method

will be applied to the target data set a number of times

with a different number of clusters and a validity in-

dex will be computed for each resulting clustering.

The result which leads to the best index value will be

taken as being the most optimal. Given n the number

of data points, the number of clusters to try can be a

sequence (often from 2 to

√

n), all possible values (1

to n), or a selection of speciﬁc values or ranges based

on prior knowledge of the data set.

Even with the use of cluster validity indices, it is

still required to cluster the data many times and com-

pare the results to determine the optimal clustering.

There is a group of clustering algorithms, called auto-

matic clustering algorithms, which determine an opti-

mal number of clusters automatically. These methods,

although generally more complex and time consum-

ing, do not need to be run more than once. Some of

these algorithms, such as Y-means(Guan et al., 2003),

still require an initial number of clusters from which

to start. Others, such as the method proposed by Mok

et. al. (Mok et al., 2012), hereafter referred to as

RAC, requires no user input at all regarding the num-

ber of clusters. Our goal is to develop an automatic

clustering algorithm which requires minimal user in-

put and more speciﬁcally does not require to be pro-

vided a target number of clusters or an initial number

of clusters from which to start.

2 RELATED WORKS

2.1 Types of Clustering

Clustering methods can be categorized in many ways

such as hard or fuzzy, hierarchical or partitional, and

as combinations of these types.

201

LaPlante F., Belacel N. and Kardouchi M..

A Hierarchical Clustering Based Heuristic for Automatic Clustering.

DOI: 10.5220/0004925902010210

In Proceedings of the 6th International Conference on Agents and Artiﬁcial Intelligence (ICAART-2014), pages 201-210

ISBN: 978-989-758-015-4

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

2.1.1 Hard vs. Fuzzy Clustering

Hard clustering, also called crisp clustering, is a type

of clustering where every datum belongs to one and

only one cluster. In contrast, fuzzy clustering is a

form of clustering where data belong to multiple clus-

ters according to a membership function (Gan, 2011).

Hard clustering is generally simpler to implement and

has lower time complexity. Hard clustering performs

well with linearly separable data but often does not

perform very well with non linearly separable data,

outliers, or noise. Fuzzy clustering often has a larger

memory footprint as it often requires a c×n matrix to

store memberships, where c is the number of clusters

and n is the number of data points. Fuzzy clustering

is able to handle non-linearly separable data as well

as outliers, and noise better than hard clustering.

2.1.2 Hierarchical vs. Partitional Clustering

A hierarchical clustering method yields a dendrogram

representing the nested grouping of patterns and sim-

ilarity levels at which groupings change (Jain et al.,

1999). A partitional clustering method yields a single

partition of the data instead of a clustering structure,

such as the dendrogram produced by a hierarchical

method (Gan, 2011).

2.1.3 Automatic Clustering

Automatic clustering is a form of clustering where

the number of clusters c is unknown and determining

its optimal value is left up to the clustering method.

Some automatic clustering methods may require an

initial number of clusters, from which clusters will

be split and merged until a pseudo-optimal number of

clusters is achieved. Other methods require no initial

value or additional information regarding the num-

ber of clusters and will determine a pseudo-optimal

value without any user input. Other parameters, such

as a fuzzy constant (for fuzzy clustering algorithms)

or thresholds, may still be required, but are generally

kept to minimum or are optional with good default

values.

2.2 Validation Methods

As clustering is by deﬁnition an unsupervised

method, there is generally no training data with

known output values with which to compare results.

As such, it requires a different approach to evaluat-

ing its results. The quality of clustering is evaluated

using a validity index, which is a relative measure of

clustering quality based on a number of parameters.

There are many clustering validity indices, but the ap-

proach to using them generally remains the same and

is as follows:

1. Use ﬁxed values for all parameters other than c

the number of clusters.

2. Iteratively cluster the data set with the clustering

method being evaluated with varying values of c

(often from 2 to

√

2).

3. Calculate the validity index for every clustering

generated by 2.

4. The clustering for which the validity index

presents the best value is considered to be “op-

timal”.

A good index must consider compactness (high intra-

cluster density), separation (high inter-cluster dis-

tance or dissimilarity) and the geometric structure of

data (Wu and Yang, 2005).

2.2.1 Xie and Beni Index

Xie and Beni have proposed a validity index which

relies on two properties, compactness and separation

(Xie and Beni, 1991), which was later modiﬁed by

Pal and Bezdek (Pal and Bezdek, 1995). This index is

deﬁned by

∑

i=1

∑

k=1

−v

n(min

i, j∈c,i6= j

−v

})

(1)

where u is a n×c matrix such that u

is the mem-

bership of object k to cluster i, m is a fuzzy constant,

are data points and v

are clusters (represented by

their centroids).

The numerator of the equation, which is equiva-

lent to the least squared error, is an indicator of com-

pactness of the fuzzy partition, while the denominator

is an indicator of the strength of the separation be-

tween the clusters. A more optimal partition should

produce a smaller value for the compactness and well

separated clusters should produce a higher value for

the separation. An optimal number of clusters c is

generally found by solving min

2≤c≤n−1

(c).

2.2.2 Fukuyama and Sugeno Index

Fukuyama and Sugeno also proposed a validity index

based on compactness and separation (Fukuyama and

Sugeno, 1989) deﬁned by:

= J

−K

∑

i=1

∑

k=1

−v

−

∑

i=1

∑

k=1

− ¯vk

(2)

where J

represents a measure of compactness, K

represents a measure of separation between clusters

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

202

and ¯v is the mean of all cluster centroids. An opti-

mal number of clusters c is generally found by solving

min

2≤c≤n−1

(c).

2.2.3 Kwon Index

Kwon extends the index of Xie and Beni’s validity

function to eliminate its tendency to monotonically

decrease when the number of clusters approaches the

number of data points. To achieve this, a penalty

function was introduced to the numerator of Xie and

Beni’s original validity index. The resulting index

was deﬁned as

∑

j=1

∑

i=1

−v

∑

i=1

− ¯vk

min

i,k∈c,i6=k

−v

(3)

An optimal number of clusters c is generally found

by solving min

2≤c≤n−1

(c).

2.2.4 PBM Index

Pakhira and Bandyopadhyay (Pakhira et al., 2004)

proposed the PBM index, which was developed for

both hard and fuzzy clustering. The hard clustering

version of the PBM index is deﬁned by

PBM



·D



(4)

where

∑

k=1

(5)

and

∑

j=1

−v

k (6)

with v

being the centroid of the data set and

= max

i, j∈c

−v

k (7)

An optimal number of clusters c is generally found

by solving max

2<c<n−1

PBM

(c).

2.2.5 Compose within and between Scattering

The CWB index proposed by Rezaee(Rezaee et al.,

1998) focusing on both the density of clusters and

their separation. Although meant to evaluate fuzzy

clustering results, it can be used to evaluate hard clus-

tering by generating a partition matrix u such that

memberships have values of 1 or 0 (is a member or

is not a member).

Given a fuzzy c-partition of the data set X =

,...,x

∈ R

} with c cluster centers v

, the

variance of the pattern set X is called σ(X) ∈ R

with

the value of the pth dimension deﬁned as

∑

k=1

− ¯x

)

(8)

where ¯x

is the pth element of the mean of

X =

∑

k=1

/n.

The fuzzy variation of cluster i is called σ(v

) ∈R

with the pth value deﬁned as

∑

k=1

−v

)

(9)

The average scattering for c clusters is deﬁned as

Scat(c) =

∑

i=1

kσ(v

kσ(X)k

(10)

where kxk = (x

·x)

1/2

A dissimilarity function Dis(c) is deﬁned as

Dis(c) =

max

min

∑

k=1

∑

z=1

−v

−1

(11)

where D

max

= max

i, j∈{2,3,...,c}

{kv

−v

k} is the

maximum dissimilarity between the cluster proto-

types. The D

min

has the same deﬁnition as D

max

, but

for the minimum dissimilarity between the cluster

prototypes.

The compose within and between scattering index

is now deﬁned by combining the last two equations:

CWB

= αScat(c) + Dis(c) (12)

Where α is a weighting factor.

An optimal number of clusters c is generally found

by solving min

2<c<n−1

CWB

(c).

2.2.6 Silhouettes Index

Rousseeuw introduced the concept of silhouettes

(Rousseeuw, 1987) which represent how well data lie

within their clusters. The silhouette value of a datum

is deﬁned by

S(i) =











1−a(i)/b(i), a(i) < b(i)

0, a(i) = b(i)

b(i)/a(i) −1, a(i) > b(i)

(13)

which can also be written as

S(i) =

b(i) −a(i)

max{a(i),b(i)}

(14)

AHierarchicalClusteringBasedHeuristicforAutomaticClustering

203

where a(i) is the average dissimilarity between a point

i and all other points in its cluster and b(i) is the av-

erage dissimilarity between a point i and all points

of the nearest cluster to which point i is no assigned.

The silhouette index for a given cluster is the aver-

age silhouette for all points within that cluster and the

silhouette index of a clustering is the average of all

silhouettes in the data set:

∑

i=1

S(i)/n. (15)

An optimal number of clusters c is generally found

by solving max

2≤c≤n−1

(c).

3 PROPOSED METHOD

The proposed method, Heuristic Divisive Analysis

(HDA), consists of two phases: splitting and merg-

ing. The ﬁrst phase splits the data set into a num-

ber of clusters, often leading to more cluster than

optimal. The second phase merges (or links) clus-

ters, leading to a more optimal clustering. The rea-

son for this two-step approach is to address one of

the larger drawbacks of hard clustering; poor perfor-

mance when dealing with data which is not linearly

separable. Both steps use different approaches to

computing the dissimilarity between clusters, which

allows for the creation of non-elliptical clusters which

may be nested or interlocked.

3.1 Splitting

The splitting algorithm is a divisive hierarchical

method based on the DIANA clustering algorithm

(Kaufman and Rousseeuw, 1990). However, the pro-

posed method employs a heuristic function to inter-

rupt the hierarchical division of the data set once an

“adequate” clustering for this step has been reached.

3.1.1 DIANA

DIANA (DIvisive ANAlysis) is a divisive hierar-

chical clustering algorithm based on the idea of

MacNaughton-Smith et al. (MacNaughton-Smith,

1964). Given X = x

,...,x

a data set consisting of

n records and beginning with all points being in one

cluster, the algorithm will alternate between separat-

ing the cluster in two and selecting the next cluster to

split until every point has become its own cluster. To

split a cluster in two, the algorithm must ﬁrst ﬁnd the

point with the greatest average dissimilarity to the rest

of the records. The average dissimilarity of a record

with regards to X is deﬁned as

n−1

∑

j=1, j6=i

D(x

,xj) (16)

where D(x,y) is a dissimilarity metric (in this

case, we use Euclidean distance). Given D

max

0≤i≤n−1

, x

max

is the point with the greatest av-

erage dissimilarity which is then split from the clus-

ter. We then have two clusters: C

= {x

max

} and

= X\C

. Next, the algorithm checks every point

in C

to determine whether or not it should be moved

to C

. To accomplish this, the algorithm must com-

pute the dissimilarity between x and C

as well as the

dissimilarity between x and C

\x. The dissimilarity

between x and C

is deﬁned as

(x) =

∑

y∈C

D(x,y),x ∈C

(17)

where |C

| denotes the number of records in C

. The

dissimilarity between x and C

\x is deﬁned as

(x) =

−1|

∑

y∈C

,y6=x

D(x,y),x ∈C

(18)

If D

< D

, then x is moved from C

to C

. This

process is repeated until there are no more records in

which should be moved to C

To select the next cluster to separate, the algorithm

will chose the cluster with the greatest diameter. The

diameter of a cluster is deﬁned as

Diam(C) = max

x,y∈C

D(x,y) (19)

3.1.2 Heuristic Stopping Function

The ﬁrst phase in our method consists of running

the DIANA algorithm with a heuristic function in or-

der to stop it once an “adequate” clustering has been

reached. This function consists of ﬁrst calculating

the average intra-cluster dissimilarity (again, we use

Euclidean distance) of each cluster, deﬁned as

AvgIntraClusterDistance(C) =

∑

x∈C

D(x, ¯x)

|C|

(20)

where ¯x denotes the mean of all points in cluster C.

The heuristic index for this clustering is the average

of all the average intra-cluster dissimilarities. If the

heuristic index for this clustering is lower than that

of the previous clustering, the current clustering is

considered the most optimal to date. Otherwise, we

have reached our “adequate” clustering at the previ-

ous step, but we will continue running the DIANA

algorithm for a set number of iterations as a preventa-

tive measure against falling into a local optimum. We

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

204

chose this rather simple heuristic instead of one of the

many known validity indices because it allowed us to

decrease the complexity (as it uses values which our

implementation had already calculated) and still pro-

duced good results.

3.2 Merging

The splitting phase’s result can be non-optimal. This

is especially likely when data sets contain clusters

which are not linearly separable or have irregular

shapes. In these cases, the “adequate” clustering will

usually contain instances where what should be one

single cluster is divided into many. These many clus-

ters will be very close to each other in relation to the

other clusters and it is the goal of this merging phase

to collect them into optimal clusters.

For each pair of clusters, we calculate the average

nearest neighbor dissimilarity, deﬁned as

AvgNearestNeighbor(C) =

∑

x∈C

min

y∈C,y6=x

D(x,y)

|C|

(21)

for both clusters and keep the greater of both values

as our merging dissimilarity threshold M

. We then

go through each pair of objects with one object from

each cluster and if we ﬁnd a pair where the dissimi-

larity between the two objects is less than the merg-

ing dissimilarity threshold (multiplied by a constant),

then the two clusters are merged. We express the test

for merging as

CanMerge(C

) =

(

true, ∃x ∈C

,∃y ∈C

|D(x,y) < M

·K

false, otherwise

(22)

Where K is a merging constant.

Once all merges are completed, we are left with

the ﬁnal clustering. The value of the merging con-

stant can be adjusted depending on the data set and

we have found experimentally that a value of 2 gener-

ally produces good results.

We havealso tested an alternativemerging method

based on the Y-means approach to merging. Because

the Y-means algorithm uses dissimilarities between

cluster centroids, merging clusters will relocate the

centroids in such a way that is detrimental to our

method. To avoid this drawback, we link clusters by

attributing them labels instead of merging them until

all pairs are linked, after which we merge all linked

clusters. We express the test for linking as

CanLink(C

) =

(

true D(C

) ≤ (σ

·σ

) ·L

false otherwise

(23)

where σ

is the standard deviation of the dissimilarity

between the objects in a cluster C

to the centroid of

that cluster and L is a linking constant. The value

of the linking constant can be adjusted depending on

the data set and we have found that a value of 0.5

generally produces good results with our method.

4 RESULTS

The proposed method was tested with ﬁve data sets.

The results were compared to the Y-means, fuzzy c-

means (Bezdek et al., 1984) and RAC algorithms us-

ing the Xie & Beni, Fukuyama & Sugeno, Kwon,

CWB, PBM and Silhouette validation indices.

4.1 Data Sets

The ﬁrst data set was the Iris data set (Fisher, 1936),

composed of 150 elements in four dimensions belong-

ing to three categories of 50 elements each; however,

two of the three categories of the data set are so close

as to generally be clustered together.

The second data set, or “nested circles” data set, is

composed of 600 elements in two dimensions belong-

ing to two groups. The ﬁrst group, of 100 elements,

is a full circular shape in the center of the plane. The

second group, of 500 elements, is a circular shell sur-

rounding the ﬁrst group. As the centroids of both

clusters are approximately identical, it is difﬁcult for

clustering methods which use cluster centroids (such

as Y-means and fuzzy c-means) to produce an appro-

priate clustering.

The third data set, or “nested crescents” data set,

is composed of 500 elements in two dimensions be-

longing to two groups of 250 elements each. The two

groups form opposing semi-circles which are offset

and inset in such a way that one tip of each semi-circle

is nested within the other semi-circle.

The fourth data set, or “ﬁve groups” data set, is

composed of 1500 elements in two dimensions be-

longing to ﬁve groups of 300 elements each. Each

group is a roughly circular with an approximately

Gaussian distribution. The groups are spread in such

a way as to have two pairs of tightly adjacent clusters.

The ﬁfth data set or “Aggregation” data set is a

testing data set proposed by Gionis et.al. (Gionis

et al., 2007). This data set presents 7 roughly ellip-

tical groupings, one of which has a concave indenta-

tion. Two pairs of these groups a linked by narrow

lines of data points.

AHierarchicalClusteringBasedHeuristicforAutomaticClustering

205

4.2 Clustering Results

We have compared our method to the Y-means al-

gorithm, another hard automatic clustering method

based on the well-known k-means algorithm. Y-

means requires an initial number of clusters, as such

we provided it with the known optimal number of

clusters or the best approximations thereof.

We have also compared our method to the fuzzy

c-means algorithm. Although this method belongs

to the category of fuzzy clustering, we compared our

method to it as our method should be able to correctly

treat non-linearly separable data and comparison with

a fuzzy method could prove interesting.

As well as the previous two methods, we have

compared our method to the RAC method. This

method makes use of the fuzzy c-means algorithm as

well as graph partitioning concepts to arrive at a hard

partition. This automatic clustering method should

also be able to correctly treat non-linearly separable

data but has a greater time complexity.

Of the validity indices used, Xie & Beni,

Fukuyama & Sugeno, Kwon, and CWB should be

minimized (lower values indicates a better cluster-

ing) while PBM and Silhouette should be maximized

(higher value indicates a better clustering).

4.2.1 Iris Data Set

Fig. 1 shows the result of clustering the Iris data

set with our method. We can discern four clusters,

two of which contain one and two members respec-

tively. These two clusters are considered as outliers

and the remaining two clusters then approximately

correspond to the expected results.

Table 1 shows the validation results of our method

and the compared methods for the Iris data set. The

results for the proposed method (HDA) were calcu-

lated after removing all outliers. We notice that for

the XB, Kwon, CWB, and PBM indices, although our

method does not produce the best validation result, its

results are very near the best. For the XB and Kwon

indices, our method outperformed the other hard clus-

tering methods. The small variations in results be-

tween our method and the others are partly due to the

data points eliminated when removing outliers.

4.2.2 Nested Circles Data Set

Fig. 2 shows the result of clustering the nested circles

data set with our method and Table 2 shows the vali-

dation results of our method and the compared meth-

ods for the nested circles data set.

We can observe that the two clusters are correctly

identiﬁed. However, the validation indices for our

Figure 1: Clustering result on Iris data set.

Figure 2: Clustering result on nested circles data set.

method and RAC (which produced the same cluster-

ing) are all much worse than those of Y-means and

fuzzy c-means which did not correctly identify the

clusters (see Fig. 3). This is in part due to the fact

that most of these indices use the centroids of clus-

ters to compute dissimilarities, which is also at least

in part the reason why Y-means and fuzzy c-means

did not produce good results.

4.2.3 Nested Crescents Data Set

Fig. 4 shows the result of clustering the nested cres-

cents data set with our method. We can see that the

two clusters are correctly identiﬁed.

Table 3 shows the validation results of our method

and the compared methods for the nested crescents

data set. The RAC method has no values for this data

set as it clustered the entire data set into a single clus-

ter.

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

206

Table 1: Iris data set validation results.

XB FS Kwon CWB PBM Silhouette

KMP (c=2) 0.0654087 −592.227 10.0613 0.503325 23.8917 0.690417

KMP (c=3) 1.55946 −789.946 239.56 2.7801 14.4217 0.561084

FCM (c=2) 0.0544162 −530.501 8.41243 0.503334 17.1528 0.685031

FCM (c=3) 0.137036 −509.939 21.966 1.34779 14.8009 0.558518

RAC 0.0654087 −592.227 10.0613 0.503325 23.8917 0.690417

HDA 0.061941 −568.303 9.35532 0.508643 24.2696 0.697063

Table 2: Nested circles data set validation results.

XB FS Kwon CWB PBM Silhouette

KMP (c=2) 0.45743 −3367.37 274.708 0.422658 8.53994 0.340658

FCM (c=2) 0.318881 −2173.72 191.578 0.427173 6.34011 0.338348

RAC 2600.64 −0.908002 1.56039e6 25.7067 0.00151379 −0.0477678

HDA 2600.64 −0.908002 1.56039e6 25.7067 0.00151379 −0.0477678

Table 3: Nested crescents data set validation results.

XB FS Kwon CWB PBM Silhouette

KMP (c=2) 0.318794 −5495.04 159.647 0.302265 19.4784 0.342838

FCM (c=2) 0.142008 −5212.3 71.2541 0.269979 22.4437 0.472577

RAC − − − − − −

HDA 0.304331 −5071.23 152.415 0.312159 18.2753 0.377258

Figure 3: Y-means result on nested circles data set.

Again, Y-means and fuzzy c-means obtain better

values with validity indices while producing inferior

results (see Fig. 5).

4.2.4 5 Groups Data Set

Fig. 6 shows the result of clustering the ﬁve groups

data set with our method. We can observe seven clus-

ters, two of which contain one and two member points

respectively. These two clusters are treated as outliers

Figure 4: Clustering result on nested crescents data set.

Figure 5: Y-means result on nested crescents data set.

AHierarchicalClusteringBasedHeuristicforAutomaticClustering

207

Table 4: Five groups data set validation results.

XB FS Kwon CWB PBM Silhouette

KMP (c=5) 7.64716 −540692 11494.1 0.608328 427.786 0.532795

FCM (c=5) 0.0506787 −523890 78.7023 0.155432 2101.97 0.730427

RAC 0.05887 −581968 91.0297 0.15704 3985.93 0.730427

HDA 0.0583695 −581594 90.1025 0.156968 4012.27 0.73118

Table 5: Aggregation data set validation results.

XB FS Kwon CWB PBM Silhouette

KMP (c=5&7) 0.320621 −86011.4 253.039 0.100997 98.4674 0.272249

FCM (c=5) 0.185912 −78942.4 148.601 0.215287 117.716 0.500565

FCM (c=7) 0.26758 −73294.5 214.788 0.34196 66.6775 0.467089

RAC − − − − − −

HDA (K=2.0) 1.98674 −128690 1567.92 0.329051 142.117 0.241994

HDA (K=0.7) 0.489475 −121270 387.897 0.295478 302.46 0.468173

HDA (K=0.8) 0.723013 −138778 571.038 0.310896 284.03 0.455008

Figure 6: Clustering result on ﬁve groups data set.

and the remaining ﬁve clusters then correspond to the

expected result. Table 4 shows the validation results

of our method and the compared methods for the ﬁve

groups data set. The results for the proposed method

were calculated after removing all outliers. Similarly

to the Iris data set, our method outperforms the other

hard clustering methods in the XB and Kwon indices

as well as in the CWB index for this data set. Our

method also produced the best values for the PBM

and Silhouette indices.

4.2.5 Aggregation Data Set

Fig. 7 shows the result of clustering the Aggregation

data set with our method. We can observe that the

three clusters produced are not ideal. The top 3 clus-

ters have been grouped together yet should be sep-

arate. After adjusting the merging constant K from

its default value of 2.0 to 0.8, we obtain the clustering

seen in ﬁg. 8. This new clustering is better but still not

perfect as the upper-left and upper-center clusters are

still grouped together and some outliers are produced.

Figure 7: Clustering result on Aggregation data set.

Figure 8: Clustering result on Aggregation data set.

Reducing K to 0.7 produced the clustering seen in ﬁg.

9. Reducing K further produced no improvement as

the clusterings produced were under-merged and rep-

resented the data even more poorly.

Table 5 shows the validation results of our method

and the compared methods for the Aggregation data

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

208

Figure 9: Clustering result on Aggregation data set.

set. The results for the proposed method were cal-

culated after removing all outliers. With the excep-

tion of the FS index, our method performed best with

a merging constant of 0.7. With these values, our

method outperformed the Y-means method in all but

the CWB index. For the other indices, our method

performed similarly but slightly worse than fuzzy c-

means with 5 clusters with the exception of the PBM

index where our method performed signiﬁcantly bet-

ter. The RAC method has no values for this data set

as it clustered the entire data set into a single cluster.

5 CONCLUSIONS

In this paper, an automatic clustering method based

on a heuristic divisive approach has been proposed

and implemented. The method is based on the

DIANA algorithm interrupted by a heuristic stopping

function. As this process alone generally produces

too many clusters, its result is then passed on to a

merging method. The advantage of this two phase

approach being that with the splitting and merging us-

ing different criteria for determining if data belong in

a same cluster, the merged clusters can take non el-

liptical shapes. This advantage sets our method apart

from the majority of hard clustering methods in that it

can handle data which is not linearly separable fairly

well.

Five data sets have been used to evaluate the

proposed clustering method. The proposed method

was also compared against an automatic hard clus-

tering method, a fuzzy clustering method (for which

a known number of clusters was provided), and an

automatic clustering method based on fuzzy c-means

using multiple cluster validity indices. The proposed

method was shown to be roughly equivalent in effec-

tiveness as the others to which it was compared when

clustering linearly separable data sets and equivalent

or better when clustering non linearly separable data

sets without ever needing to be provided a number of

clusters.

There remains work to be done in ﬁnding more ap-

propriate validation methods to evaluate the proposed

method as the validity indices used fall victim to the

same pitfalls as most hard clustering methods when

the data set is not linearly separable. There also re-

mains to further optimize the proposed method and to

attempt modifying it for speciﬁc applications.

In conclusion, the proposed clustering method not

only identiﬁes a desired number of clusters, but pro-

duces valid clustering results.

ACKNOWLEDGEMENTS

We gratefully acknowledge the support from NBIF’s

(RAI 2012-047) New Brunswick Innovation Funding

granted to Dr. Nabil Belacel.

REFERENCES

Bezdek, J. C., Ehrlich, R., and Full, W. (1984). Fcm: The

fuzzy c-means clustering algorithm. Computers &

Geosciences, 10(23):191–203.

Fisher, R. A. (1936). The use of multiple measurements in

taxonomic problems. Annals of Eugenics, 7(2):179–

188.

Fukuyama, Y. and Sugeno, M. (1989). A new method

of choosing the number of clusters for the fuzzy c-

means method. In Proceedings of Fifth Fuzzy Systems

Symposium, pages 247–250.

Gan, G. (2011). Data Clustering in C++: An Object-

Oriented Approach. Chapman and Hall/CRC.

Gionis, A., Mannila, H., and Tsaparas, P. (2007). Clustering

aggregation. ACM Trans. Knowl. Discov. Data, 1(1).

Guan, Y., Ghorbani, A., and Belacel, N. (2003). Y-

means: a clustering method for intrusion detection.

In Electrical and Computer Engineering, 2003. IEEE

CCECE 2003. Canadian Conference on, volume 2,

pages 1083–1086. IEEE.

Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). Data

clustering: a review. ACM Comput. Surv., 31(3):264–

323.

Kaufman, L. R. and Rousseeuw, P. (1990). Finding groups

in data: An introduction to cluster analysis.

MacNaughton-Smith, P. (1964). Dissimilarity Analysis: a

new Technique of Hierarchical Sub-division. Nature,

202:1034–1035.

Mok, P., Huang, H., Kwok, Y., and Au, J. (2012). A

robust adaptive clustering analysis method for auto-

matic identiﬁcation of clusters. Pattern Recognition,

45(8):3017–3033.

AHierarchicalClusteringBasedHeuristicforAutomaticClustering

209

Pakhira, M. K., Bandyopadhyay, S., and Maulik, U. (2004).

Validity index for crisp and fuzzy clusters. Pattern

Recognition, 37(3):487–501.

Pal, N. and Bezdek, J. (1995). On cluster validity for

the fuzzy c-means model. Fuzzy Systems, IEEE

Transactions on, 3(3):370–379.

Rezaee, M. R., Lelieveldt, B., and Reiber, J. (1998). A new

cluster validity index for the fuzzy c-mean. Pattern

Recognition Letters, 19(34):237–246.

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to

the interpretation and validation of cluster analysis.

Journal of Computational and Applied Mathematics,

20(0):53–65.

Wu, K.-L. and Yang, M.-S. (2005). A cluster validity in-

dex for fuzzy clustering. Pattern Recognition Letters,

26(9):1275–1291.

Xie, X. and Beni, G. (1991). A validity measure for fuzzy

clustering. Pattern Analysis and Machine Intelligence,

IEEE Transactions on, 13(8):841–847.

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

210