Multilinear Objective Function-based Clustering
Giovanni Rossi
Departments of Computer Science and Engineering - DISI, and Mathematics
University of Bologna, Mura Anteo Zamboni 7, 40126, Bologna, Italy
Keywords:
Fuzzy Clustering, Similarity Matrix, Pseudo-Boolean Function, Multilinear Extension, Local Search.
Abstract:
The input of most clustering algorithms is a symmetric matrix quantifying similarity within data pairs. Such
a matrix is here turned into a quadratic set function measuring cluster score or similarity within data subsets
larger than pairs. In general, any set function reasonably assigning a cluster score to data subsets gives rise to an
objective function-based clustering problem. When considered in pseudo-Boolean form, cluster score enables
to evaluate fuzzy clusters through multilinear extension MLE, while the global score of fuzzy clusterings
simply is the sum over constituents fuzzy clusters of their MLE score. This is shown to be no greater than the
global score of hard clusterings or partitions of the data set, thereby expanding a known result on extremizers of
pseudo-Boolean functions. Yet, a multilinear objective function allows to search for optimality in the interior
of the hypercube. The proposed method only requires a fuzzy clustering as initial candidate solution, for the
appropriate number of clusters is implicitly extracted from the given data set.
1 INTRODUCTION
Clustering means identifying groups within data, with
the intent to have similarity between elements in a
same group or cluster, and dissimilarity between el-
ements in different groups. It is important in a variety
of fields at the interface of computer science, artificial
intelligence and engineering, including pattern recog-
nition and learning, web mining and bioinformatics.
In hard clustering the sought group structure is a par-
tition of the data set, and clusters are blocks (Aigner,
1997); each data point is in precisely one cluster,
with full or unit membership. In fuzzy clustering data
points distribute [0,1]-ranged memberships over clus-
ters. This yields more flexibility, which is useful in
many applications (Kashef and Kamel, 2010; Valente
de Oliveira and Pedrycz, 2007).
In objective function-based clustering, the prevail-
ing clusters obtain by maximizing or minimizing an
objective function: any solution is mapped into a real
quantity, i.e. its efficiency or cost, obtained as the
sum over clusters of their own quality. Thus in hard
clustering this sum is over blocks, with a cluster score
set function taking real values over the 2
n
1-set of
non-empty data subsets (Roubens, 1982), n being the
number of data. How to specify cluster score is the
first issue addressed below.
When dealt with in pseudo-Boolean form, cluster
score admits a unique polynomial multilinear exten-
sion MLE over n-dimensional unit hypercube [0,1]
n
.
Such a MLE is a novel and seemingly appropriate
measure of the score of fuzzy clusters. This is where
to start for the clustering method proposed here. As
fuzzy clusterings are collections of fuzzy clusters,
attention is placed on those such collections where
the data distribute memberships adding up to 1, with
optimality found where the sum over fuzzy clusters
of their MLE-score is maximal. If global cluster
score is evaluated via the MLE of cluster score, then
its bounds are on hard clusterings. This expands a
known result in pseudo-Boolean optimization (Boros
and Hammer, 2002). Clustering is thus approached in
terms of set partitioning, with this latter combinatorial
optimization problem extended from a discrete to a
continuous domain (Pardalos et al., 2006) and solved
through a novel local search heuristic.
1.1 Related Work and Approach
Objective function-based fuzzy clustering mainly de-
velops from the fuzzy c-means FcM (Bezdek and Pal,
1992) and the possibilistic c-means PcM (Krishnapu-
ram and Keller, 1996) algorithms. Given n data points
in R
m
, with m observed features, both FcM and PcM
iteratively act on c < n cluster centers or prototypes,
aiming to minimize a cost: the sum over clusters of all
distances between cluster elements and their center.
For any c centers as input, at each iteration the mem-
Rossi, G..
Multilinear Objective Function-based Clustering.
In Proceedings of the 7th International Joint Conference on Computational Intelligence (IJCCI 2015) - Volume 2: FCTA, pages 141-149
ISBN: 978-989-758-157-1
Copyright
c
2015 by SCITEPRESS Science and Technology Publications, Lda. All r ights reserved
141
berships of data to clusters is re-defined so to mini-
mize the sum of (fuzzy) distances from centers, and
next centers themselves are re-calculated so to min-
imize the sum of distances from (fuzzy) members.
In FcM (but not in PcM) membership distributions
over the c clusters add up to 1. The iteration stops
when two consecutive fuzzy clusterings (specifying c
centers and c × n memberships) are sufficiently close
(or coincide). This converges to a local minimum,
and given non-convexity of the objective function, the
choice of suitable initial cluster centers is crucial. Ini-
tial cluster centers have to be arbitrary, and it is harsh
to assess whether c is a proper number of clusters
for the given data set (M
´
enard and Eboueya, 2002).
Much effort is thus devoted to finding the optimal
number of clusters. One approach is to validate fuzzy
clusterings obtained at different values of c by means
of an index, and then selecting the value of c for the
output that scored best on the index (Zahid et al.,
2001). This validation may be integrated into the FcM
iterations, yielding a validity-guided method (Bensaid
et al., 1996). Main cluster validity indices are: clas-
sification entropy, partition coefficient, uniform data
functional, compactness and separation criteria. They
may be analyzed comparatively in terms membership
distributions over clusters (Pal and Bezdek, 1995). A
common idea is that clustering performance is higher
the more distributions are concentrated, as this for-
malizes a non-ambiguous classification (Rezaee et al.,
1998; Xie and Beni, 1991).
Recent clustering methods such as neural gas
(Cottrell et al., 2006), self organizing maps (Wu and
Chow, 2004), vector quantization (Lughofer, 2008;
Du, 2010) and kernel methods (Shawe-Taylor and
Cristianini, 2004) maintain special attention on find-
ing the optimal number of clusters for the given
data. In several classification tasks concerning protein
structures, text documents, surveys or biological sig-
nals, an explicit metric vector space (i.e. R
m
above)
is not available. Then, clustering may rely on spec-
tral methods, where the
n
2
similarities within data
pairs are the adjacency matrix of a weighted graph,
and the (non-zero) eigenvalues and eigenvectors of
the associated Laplacian are used for partitioning the
data (or vertex) set. Specifically, the sought partition
is to be such that lightest edges have endpoints in dif-
ferent blocks, whereas heaviest edges have both end-
points in a same block. Although spectral clustering
focuses on hard rather than fuzzy models, still it dis-
plays some analogy with the local search method de-
tailed below, as in both cases full membership of data
points in prevailing clusters is decided in a single step.
In fact, spectral methods mostly focus on some first
c < n eigenvalues (in increasing order) (von Luxburg
et al., 2008; Ng et al., 2002), thus constraining the
number of clusters. Here, such a constraint is possi-
ble as well, although with suitable candidate solution
the proposed local search autonomously finds an op-
timal (unrestricted) number of clusters.
Clustering is here approached by firstly quantify-
ing the cluster score of every non-empty data subset,
and secondly in terms of the associated set partition-
ing combinatorial optimization problem (Korte and
Vygen, 2002). Cluster score thus is a set function
or, geometrically, a point in R
2
n
1
, and rather than
measuring a cost (or sum of distances) to be mini-
mized (see above), it measures a worth to be maxi-
mized. The idea is to quantify, for every data subset,
both internal similarity and dissimilarity with respect
to its complement. This resembles the “collective
notion of similarity” in information-based clustering
(Slonim et al., 2005). Objective function-based clus-
tering intrisically relies on the assumption that every
data subset has an associated real-valued worth (or,
alternatively, a cost). A main novelty proposed below
is to deal with both hard and fuzzy clusters at once by
means of the pseudo-Boolean form of set functions.
In order to have the same input as in many clustering
algorithms, the basic cluster score function provided
in the next section obtains from a given similarity ma-
trix, and has polynomial MLE of degree 2 (Boros and
Hammer, 2002, pp. 157, 162). This also keeps the
computational burden at a seemingly reasonable level.
2 CLUSTER SCORE
Given data set N = {1,..., n}, the input of most clus-
tering algorithms (Xu and Wunsch, 2005) is a sym-
metric similarity matrix S [0, 1]
n×n
, with S
i j
quan-
tifying similarity within data pairs {i, j} N. If data
points belong to a Euclidean space, i.e. N R
m
, then
similarities S
i j
= 1d(i, j) may obtain through a nor-
malized distance d : N ×N [0,1]. Not only pairs but
also any data subset A N may have a measurable in-
ternal similarity (and, possibly, dissimilarity with re-
spect to its complement A
c
= N\A), interpreted as its
score w(A) as a cluster. How to specify set function
w
S
from given matrix S is addressed hereafter.
For 2
N
= {A : A N}, collection {ζ(A,·) : A 2
N
}
is a linear basis of the vector space R
2
n
of real-
valued functions w on 2
N
, where ζ : 2
N
× 2
N
R
is the element of the incidence algebra (Rota, 1964;
Aigner, 1997) of Boolean lattice (2
N
,, ) defined by
ζ(A,B) = 1 if B A and ζ(A,B) = 0 if B 6⊇ A. That is,
the zeta function. Any w corresponds to linear combi-
nation w(B) =
A2
N
µ
w
(A)ζ(A,B) =
AB
µ
w
(A) for
all B 2
N
, with M
¨
obius inversion µ
w
: 2
N
R given
FCTA 2015 - 7th International Conference on Fuzzy Computation Theory and Applications
142
by (note that denotes strict inclusion) µ
w
(A) =
=
BA
(1)
|A\B|
w(B)
with ζ(B,A) = (1)
|A\B|
,
= w(A)
BA
µ
w
(B) (recursion, with w(
/
0) = 0).
This combinatorial “analog of the fundamental theo-
rem of the calculus” (Rota, 1964) yields the unique
MLE f
w
: [0,1]
n
R of w, with values w(B) =
= f
w
(χ
B
) =
A2
N
iA
χ
B
(i)
!
µ
w
(A) =
AB
µ
w
(A)
on vertices, and f
w
(q) =
A2
N
iA
q
i
!
µ
w
(A) (1)
on any point q = (q
1
,. .., q
n
) [0, 1]
n
. Convention-
ally,
i
/
0
q
i
:= 1 (Boros and Hammer, 2002, p. 157).
A quadratic MLE is a polynomial of degree 2, i.e.
µ
w
(A) = 0 if |A| > 2. Geometrically, this means that
w is a point in a
n+1
2
-dimensional vector (sub)space,
i.e. w R
(
n+1
2
)
, as all its 2
n
1 values are deter-
mined by the values taken by µ
w
on the n single-
tons and on the
n
2
pairs (with n +
n
2
=
n+1
2
).
Similarity matrix S factually has only
n
2
valid en-
tries (see below), and trying to exploit them beyond
a quadratic form for cluster score w seems clumsy.
Also, S is intended precisely to measure such a score
S
i j
for all
n
2
data pairs. The sought quadratic clus-
ter score function w is thus already defined on such
pairs, i.e. w({i, j}) = S
i j
for all {i, j} 2
N
. How
to assign scores w({i}) to singletons {i}, i N seems
a more delicate matter. If such scores are set equal
to the n entries S
ii
= 1 along the main diagonal, then
M
¨
obius inversion µ
w
takes values µ
w
({i}) = 1 on sin-
gletons and µ
w
({i, j}) = S
i j
S
ii
S
j j
< 0 on pairs
(while µ
w
(A) = 0 for 1 6= |A| 6= 2). This is a suffi-
cient (but not necessary) condition for sub-additivity,
i.e. w(A B)w(A)w(B) 0 for all A,B 2
N
such
that A B =
/
0. Then, the trivial partition where each
data point is a singleton block is easily checked to be
optimal. On the other hand, setting w({i}) = 0 for all
i N yields a M
¨
obius inversion with values µ
w
(A) 0
for all A 2
N
, and this is sufficient (but not necessary)
for super-additivity, i.e. w(A B) w(A) w(B) 0
for all A,B 2
N
with A B =
/
0. Then, it becomes
optimal (and again trivial) to place all n data points
into a unique grand cluster. An interesting alternative
to these two unreasonable situations seems to be
w({i}) =
1
2
1
2(n 1)
lN\i
S
il
=
lN\i
1 S
il
2(n 1)
.
In this way, 1 S
il
quantifies diversity between i N
and l N\i, which must be equally shared among
them. The cluster score of singleton {i} then is the av-
erage of such n 1 diversities
1S
il
2
,l N\i collected
by i. M
¨
obius inversion takes values µ
w
({i}) = w({i})
on singletons and, recursively,
µ
w
({i, j}) =
nS
i j
1
n 1
lN\{i, j}
2 (S
il
+ S
jl
)
2(n 1)
on pairs. Note that the cluster score w(A) of any
A 2
N
does not depend on those
na
2
entries S
ll
0
of matrix S such that l, l
0
A
c
, where a = |A|.
In particular, if S
i j
= 1 for i A, j A\i and S
il
= 0
for i A,l A
c
, then w(A) =
a(a
2
3a+n+1)
2(n1)
=
= a ·
n a
2(n 1)
+
a
2
·
1
n a
n 1
, with
w(A) =
1
2
,1, ... ,
n
2
4n+5
2
,
n
2
for a = 1,2, ... ,n 1,n.
In terms of spectral clustering (see above), the cor-
responding adjacency matrix identifies a subgraph
spanned by vertex subset A which is a clique, and
where all edges have maximum weight, i.e. 1, with re-
markable implications for the eigenvalues of the nor-
malized Laplacian; see (Schaeffer, 2007, p. 42).
This quadratic w, obtained from given similar-
ity matrix S, is conceived as the main example of a
cluster score function. Multilinear objective function-
based clustering deals with generic set functions, pos-
sibly non-quadratic, but reasonably measuring a clus-
ter score of data subsets. In fact, the MLE (1) of set
functions w is where to start for the investigation pro-
posed in the remainder of this work. In view of the
rich literature on pseudo-Boolean methods (Crama
and Hammer, 2011), the MLE of cluster score ap-
pears very suitable for evaluating fuzzy clusters, espe-
cially in that established definitions of neighborhood
and derivative may be expanded to fit the broader set-
ting formalized in the sequel. Specifically, while the
n variables of traditional pseudo-Boolean functions
range each in [0,1], the n variables of the novel near-
Boolean form (Rossi, 2015) considered here range
each in a 2
n1
1-dimensional unit simplex. The rea-
son for this is the purpose to use the MLE of cluster
score for evaluating not only fuzzy clusters, but also
and most importantly fuzzy clusterings, which are
collections q
1
,. .., q
m
[0,1]
n
of fuzzy clusters, and
thus global score is quantifiable as
1km
f
w
(q
k
). In
this respect, note that PcM algorithms allow for mem-
bership distributions adding up to quantities < 1 (see
above) for handling outliers. These shall be placed
each in a singleton block of the partition found here
via gradient-based local search. Therefore, member-
ship distributions are like in FcM methods, i.e. rang-
ing in a 2
n1
1-dimensional unit simplex.
Multilinear Objective Function-based Clustering
143
3 FUZZY CLUSTERING
A fuzzy clustering is a m-set q
1
,. .., q
m
[0, 1]
n
of
fuzzy clusters, with (q
1
i
,. .., q
m
i
) being is member-
ship distribution. The 2
n1
-set of subsets containing
each i N is 2
N
i
= {A : i A 2
N
}, and
i
=
=
n
q
A
1
i
,. .., q
A
2
n1
i
R
2
n1
+
:
1k2
n1
q
A
k
i
= 1
o
identifies the corresponding 2
n1
1-dimensional
unit simplex, where
{
A
1
,. .., A
2
n1
}
= 2
N
i
and q
i
i
.
Definition 1. A fuzzy cover q specifies, for each
data point i N, a membership distribution over
the 2
n1
data subsets A 2
N
i
containing it. Hence
q = (q
1
,. .., q
n
)
N
, where
N
= ×
1in
i
.
Equivalently, q =
q
A
:
/
0 6= A 2
N
,q
A
[0,1]
n
is a 2
n
1-set whose elements q
A
=
q
A
1
,. .., q
A
n
are
n-vectors corresponding to non-empty data subsets
/
0 6= A 2
N
and specifying a membership q
A
i
for each
i N, with q
A
i
[0,1] if i A while q
A
j
= 0 if j A
c
.
Fuzzy covers thus generalize traditional fuzzy clus-
terings, as these latter are commonly intended as col-
lections {q
1
,. .., q
m
} as above where, in addition, for
every fuzzy cluster q
k
the associated data subset im-
plicitly is {i : q
k
i
> 0}. Conversely, fuzzy covers allow
for situations where 0 < |{i : q
A
i
> 0}| < |A| for some
/
0 6= A 2
N
, although an exactness condition intro-
duced below shows that such cases may be ignored.
Fuzzy covers being collections of points in [0,1]
n
,
and the MLE f
w
of w allowing precisely to evaluate
such points, the global score W (q) of any q
N
is
the sum over all its elements q
A
,A 2
N
of their own
score as quantified by f
w
(see (1)). That is,
W (q) =
A2
N
f
w
(q
A
) =
A2
N
"
BA
iB
q
A
i
!
µ
w
(B)
#
,
or W (q) =
A2
N
"
BA
iA
q
B
i
!#
µ
w
(A). (2)
Example 2. For N = {1,2, 3}, consider cluster scores
w({1}) = w({2}) = w({3}) = 0.2, w({1, 2}) = 0.8,
w({1,3}) = 0.3, w({2, 3}) = 0.6, w(N) = 0.7. Mem-
bership distributions of data points i = 1, 2,3 over
subsets 2
N
i
are denoted by q
1
1
,q
2
2
,q
3
3
,
q
1
=
q
1
1
q
12
1
q
13
1
q
N
1
, q
2
=
q
2
2
q
12
2
q
23
2
q
N
2
, q
3
=
q
3
3
q
13
3
q
23
3
q
N
3
.
If ˆq
12
1
= ˆq
12
2
= 1, then any membership q
3
3
yields
w({1,2}) +
q
3
3
+ q
13
3
+ q
23
3
+ q
N
3
µ
w
({3}) =
= w({1, 2}) + w({3}) = 1. This means that there is a
continuum of fuzzy covers achieving maximum score:
W ( ˆq
1
, ˆq
2
,q
3
) = 1 independently from q
3
. In order to
select the one
ˆ
q = ( ˆq
1
, ˆq
2
, ˆq
3
) where ˆq
3
3
= 1, attention
must be placed only on exact ones, defined hereafter.
For any two fuzzy covers q = {q
A
:
/
0 6= A 2
N
}
and
ˆ
q = { ˆq
A
:
/
0 6= A 2
N
}, define
ˆ
q to be a shrinking
of q if there is a subset A, with
iA
q
A
i
> 0, such that
ˆq
B
i
=
q
B
i
if B 6⊆ A
0 if B = A
for all B 2
N
,i N,
BA
ˆq
B
i
= q
A
i
+
BA
q
B
i
for all i A.
In words, a shrinking reallocates the whole member-
ship mass
iA
q
A
i
> 0 from A 2
N
to all proper sub-
sets B A, involving all and only those data points
i A with strictly positive membership q
A
i
> 0.
Definition 3. Fuzzy cover q
N
is exact as long as
W (q) 6= W (
ˆ
q) for all shrinkings
ˆ
q of q.
Proposition 4. If q is exact, then for all A 2
N
i A : q
A
i
> 0
{0,|A|}.
Proof. For
/
0 A
+
(q) =
i : q
A
i
> 0
A 2
N
, with
α = |A
+
(q)| > 1, notice that
f
w
(q
A
) =
BA
+
(q)
iB
q
A
i
!
µ
w
(B). Let shrinking
ˆ
q,
with ˆq
B
0
= q
B
0
if B
0
6∈ 2
A
+
(q)
, satisfy conditions
B2
N
i
2
A
+
(q)
ˆq
B
i
= q
A
i
+
B2
N
i
2
A
+
(q)
q
B
i
for all i A
+
(q)
and
iB
ˆq
B
i
=
iB
q
B
i
+
iB
q
A
i
for all B 2
A
+
(q)
,|B| > 1.
These are 2
α
1 equations with
1kα
k
α
k
> 2
α
variables ˆq
B
i
,B A
+
(q). Thus there is a continuum
of solutions, each providing precisely a shrinking
ˆ
q
where
B2
A
+
(q)
f
w
( ˆq
B
) = f
w
(q
A
) +
B2
A
+
(q)
f
w
(q
B
).
This entails that q is not exact.
For given w, the global score of any fuzzy cover
also attains on many fuzzy clusterings. This justifies
the following (in line with standard terminology).
Definition 5. Fuzzy clusterings are exact covers.
The global score of any fuzzy clustering is shown
below to also attain on some hard clustering, thereby
expanding a result on extremizers of pseudo-Boolean
functions (Boros and Hammer, 2002, p. 163).
FCTA 2015 - 7th International Conference on Fuzzy Computation Theory and Applications
144
4 HARD CLUSTERING
Hard clusterings or partitions of N (Aigner, 1997) are
fuzzy clusterings where q
A
i
{0,1} for all A 2
N
and
all i A. Among the maximizers of any objective
function W as above there always exist fuzzy clus-
terings (q
1
,. .., q
n
)
N
such that q
i
ex(
i
) for all
i N, where ex(
i
) denotes the 2
n1
-set of extreme
points of
i
. For q
N
,i N, let q = q
i
|q
i
, with
q
i
i
and q
i
N\i
= ×
jN\i
j
. Then, for any w,
W (q) =
A2
N
i
f
w
(q
A
) +
A
0
2
N
\2
N
i
f
w
(q
A
0
) =
=
A2
N
i
BA\i
jB
q
A
j
!
q
A
i
µ
w
(B i) + µ
w
(B)
+
+
A
0
2
N
\2
N
i
B
0
A
0
j
0
B
0
q
A
0
j
0
!
µ
w
(B
0
)
at all q
N
and for all i N. Now define
W
i
(q
i
|q
i
) =
A2
N
i
q
A
i
"
BA\i
jB
q
A
j
!
µ
w
(B i)
#
,
W
i
(q
i
) =
A2
N
i
"
BA\i
jB
q
A
j
!
µ
w
(B)
#
+
+
A
0
2
N
\2
N
i
"
B
0
A
0
j
0
B
0
q
A
0
j
0
!
µ
w
(B
0
)
#
,
yielding W (q) = W
i
(q
i
|q
i
) +W
i
(q
i
). (3)
Proposition 6. For all q
N
, there are q, q
N
such that
(i) W (q) W (q) W (q) and,
(ii) q
i
,q
i
ex(
i
) for all i N.
Proof. For all i N and q
i
N\i
, define w
q
i
:
2
N
i
R by
w
q
i
(A) =
BA\i
jB
q
A
j
!
µ
w
(B i). (4)
Let A
+
q
i
= arg maxw
q
i
and A
q
i
= arg minw
q
i
, not-
ing that A
+
q
i
6=
/
0 6= A
q
i
at all q
i
. Most importantly,
W
i
(q
i
|q
i
) =
A2
N
i
q
A
i
· w
q
i
(A)
= hq
i
,w
q
i
i, (5)
where ,·i denotes scalar product. Thus for given
membership distributions of all j N\i, global score
is affected by is membership distribution through a
scalar product. In order to maximize (or minimize)
W by suitably choosing q
i
for given q
i
, the whole
of is membership mass must be placed over A
+
q
i
(or
A
q
i
), anyhow. Hence there are precisely |A
+
q
i
| > 0
(or |A
q
i
| > 0) available extreme points of
i
. The
following procedure selects (arbitrarily) one of them.
ROUNDUP(w, q)
Initialize: Set t = 0 and q(0) = q.
Loop: While there is a i N with q
i
(t) 6∈ ex(
i
),
set t = t + 1 and:
(a) select some A
A
+
q
i
(t)
,
(b) define, for all j N,A 2
N
,
q
A
j
(t) =
q
A
j
(t 1) if j 6= i
1 if j = i and A = A
0 otherwise
.
Output: Set q = q(t).
Every change q
A
i
(t 1) 6= q
A
i
(t) = 1 (for any
i N, A 2
N
i
) induces a non-decreasing variation
W (q(t)) W (q(t 1)) 0. Hence, the sought q is
provided in at most n iterations. Analogously, replac-
ing A
+
q
i
with A
q
i
yields the sought minimizer q.
Remark 7. For i N, A 2
N
i
, if all j A\i 6=
/
0 sat-
isfy q
A
j
= 1, then (4) yields w
q
i
(A) = w(A) w(A\i),
while w
q
i
({i}) = w({i}) regardless of q
i
. For
quadratic w obtained above from similarity matrix S,
w
q
i
(A) = w({i}) +
jA\i
q
A
j
µ
w
({i, j}).
If the global score of fuzzy clusterings is quan-
tified as the sum over constituents fuzzy clusters of
their MLE-score, then for any w there are hard clus-
terings among both the maximizers and minimizers.
This seems crucial because many applications may
be modeled in terms of set partitioning, and in such
a combinatorial optimization problem fuzzy cluster-
ing is not feasible. An important example is winner
determination in combinatorial auctions (Sandholm,
2002), where a set N of items to be sold must be
partitioned into bundles towards revenue maximiza-
tion. The maximum bid received for each bundle
/
0 6= A N defines the input set function w. The above
result entails that if the objective function is multilin-
early extended over the continuous domain of fuzzy
clusterings, then any found solution can be promptly
adapted to the restricted domain of partitions, with no
score loss. The problem can thus be approached from
a geometrical perspective, allowing for novel search
strategies. Partitions P = {A
1
,. .., A
|P|
} 2
N
of N are
families of pairwise disjoint subsets whose union is N,
i.e. N =
1k≤|P|
A
k
and A
k
A
l
=
/
0,1 k < l |P|.
Any P corresponds to the collection {χ
A
: A P} of
Multilinear Objective Function-based Clustering
145
those |P| hypercube vertices identified by the charac-
teristic functions of its blocks (see above). Partitions
P can also be seen as p
N
where p
A
i
= 1 for all
A P,i A. The above findings yield the following.
Corollary 8. For any w, some partition P satisfies
W (p) W (q) for all q
N
, with W (p) =
AP
w(A).
Proof. Follows from propositions 4 and 6.
A further remark concerns cluster validity (Wang
and Zhang, 2007), with focus on those indices that
validate fuzzy clusterings by relying exclusively on
membership distributions. As already observed, a ba-
sic argument is that the more such distributions are
concentrated, the less ambiguous is the fuzzy classi-
fication. Evidently, hard clusterings provide n distri-
butions each concentrated on a unique extreme point
of the associated unit simplex. The above result indi-
cates that if global score is evaluated through MLE,
then validation may ignore membership distributions,
as the score of any optimal fuzzy clustering also ob-
tains by means of a hard one.
5 LOCAL SEARCH
Defining global maximizers is clearly immediate.
Definition 9. Fuzzy clustering
ˆ
q
N
is a global
maximizer if W (
ˆ
q) W (q) for all q
N
.
Concerning local maximizers, consider a vector
ω = (ω
1
,. .., ω
n
) R
n
++
of strictly positive weights,
with ω
N
=
jN
ω
j
, and focus on the equilibrium
(Mas-Colell et al., 1995) of the game where data
points are players who strategically choose their
memberships distribution q
i
i
while being re-
warded with fraction
ω
i
ω
N
W (q
1
,. .., q
n
) of the global
score attained at any strategy profile (q
1
,. .., q
n
).
Definition 10. Fuzzy clustering
ˆ
q
N
is a local
maximizer if W
i
( ˆq
i
|
ˆ
q
i
) W
i
(q
i
|
ˆ
q
i
) for all q
i
i
and all i N (see (3)).
This definition of local maximizer entails that the
neighborhood N (q)
N
of any q
N
is
N (q) =
[
iN
n
˜
q :
˜
q = ˜q
i
|q
i
, ˜q
i
i
o
.
Definition 11. The (i,A)-derivative of W at q
N
is
W (q)/q
A
i
= W (q(i, A)) W (q(i, A)) =
= W
i
q
i
(i,A)|q
i
(i,A)
W
i
q
i
(i,A)|q
i
(i,A)
,
with q(i,A) =
q
1
(i,A), ... ,q
n
(i,A)
given by
q
B
j
(i,A) =
q
B
j
for all j N\i,B 2
N
j
1 for j = i,B = A
0 for j = i,B 6= A
,
and q(i,A) =
q
1
(i,A), ... ,q
n
(i,A)
given by
q
B
j
(i,A) =
q
B
j
for all j N\i,B 2
N
j
0 for j = i and all B 2
N
i
,
thus W (q) = {W(q)/q
A
i
: i N,A 2
N
i
} R
n2
n1
is the (full) gradient of W at q. The i-gradient
i
W (q) R
2
n1
of W at q = q
i
|q
i
is set function
i
W (q) : 2
N
i
R defined by
i
W (q)(A) = w
q
i
(A)
for all A 2
N
i
, where w
q
i
is given by (4).
Remark 12. Membership distribution q
i
(i,A) is the
null one: its 2
n1
entries are all 0, hence q
i
(i,A) 6∈
i
.
The setting obtained thus far allows to conceive
searching for a local maximizer hard clustering q
from given fuzzy clustering q as initial candidate so-
lution, and while maintaing the whole search within
the continuum of fuzzy clusterings. This idea may be
specified in alternative ways yielding different local
search methods. One possibility is the following.
LOCALSEARCH(w, q)
Initialize: Set t = 0 and q(0) = q, with require-
ment |{i : q
A
i
> 0}| {0, |A|} for all A 2
N
.
Loop 1: While 0 <
iA
q
A
i
(t) < |A| for a A 2
N
,
set t = t + 1 and
(a) select a A
(t) 2
N
such that
iA
(t)
w
q
i
(t1)
(A
(t))
jB
w
q
j
(t1)
(B)
for all B 2
N
such that 0 <
iB
q
B
j
(t) < |B|,
(b) for i A
(t) and A 2
N
i
, define
q
A
i
(t) =
1 if A = A
(t),
0 if A 6= A
(t),
(c) for j N\A
(t) and A 2
N
j
with A A
(t) =
/
0,
define q
A
j
(t) = q
A
j
(t 1)+
+
w(A)
B2
N
j
BA
(t)6=
/
0
q
B
j
(t 1)
B
0
2
N
j
B
0
A
(t)=
/
0
w(B
0
)
1
(d) for j N\A
(t) and A 2
N
j
with A A
(t) 6=
/
0,
define
q
A
j
(t) = 0.
Loop 2: While q
A
i
(t) = 1,|A| > 1 for a i N and
w(A) < w({i}) + w(A\i), set t = t + 1 and define:
q
ˆ
A
i
(t) =
1 if |
ˆ
A| = 1
0 otherwise
for all
ˆ
A 2
N
i
,
q
B
j
(t) =
1 if B = A\i
0 otherwise
for all j A\i,B 2
N
j
,
q
ˆ
B
j
0
(t) = q
ˆ
B
j
0
(t 1) for all j
0
A
c
,
ˆ
B 2
N
j
0
.
FCTA 2015 - 7th International Conference on Fuzzy Computation Theory and Applications
146
Output: Set q
= q(t).
Both ROUNDUP and LOCALSEARCH yield a se-
quence q(0),.. .,q(t
) = q
where q
i
ex(
i
) for all
i N. In the former at the end of each iteration t
the novel q(t) N (q(t 1)) is in the neighborhood
of its predecessor. In the latter q(t) 6∈ N (q(t 1))
in general, as in |P| n iterations of Loop 1 a parti-
tion {A
(1),. .., A
(|P|)} = P is generated. Selected
clusters or blocks A
(t) 2
N
, t = 1, ... , |P| are any
of those where the sum over data points i A
(t)
of (i,A
(t))-derivatives W (q(t 1))/q
A
(t)
i
(t 1)
is maximal. Once a block A
(t) is selected, then
lines (c) and (d) make all data points j N\A
(t)
redistribute the entire membership mass currently
placed on subsets A
0
2
N
j
with non-empty intersec-
tion A
0
A
(t) 6=
/
0 over those remaining A 2
N
j
such that, conversely, A A
(t) =
/
0. The redistri-
bution is such that each of these latter gets a frac-
tion w(A)/
B2
N
j
:BA
(t)=
/
0
w(B) of the newly freed
membership mass
A
0
2
N
j
:A
0
A
(t)6=
/
0
q
A
0
j
(t 1). The
subsequent Loop 2 checks whether the partition gen-
erated by Loop 1 may be improved by exctracting
some outliers from existing blocks and putting them
in singleton blocks of the final output. An outlier ba-
sically is a data point displaying very unusual fea-
tures. In the limit, cluster score w may be such that
for some data points i N global score decreases
when i joins any cluster A 2
N
i
,|A| > 1, that is to say
w(A) w(A\i) w({i}) =
B2
A
\2
A\i
:|B|>1
µ
w
(B) < 0.
Proposition 13. Output q
of LOCALSEARCH(W, q)
is a local maximizer.
Proof. It is plain that the output corresponds to a par-
tition P. With the notation of corollary 8 in section
4, q
= p. Accordingly, any data point i N is ei-
ther in a singleton cluster {i} P or else in a cluster
A P,i A such that |A| > 1. In the former case,
any membership reallocation deviating from p
{i}
i
= 1,
given memberships p
j
, j N\i, yields a cover (fuzzy
or hard) where global score is the same as at p, be-
cause
jB\i
p
B
j
= 0 for all B 2
N
i
\A (see example
2 above). In the latter case, any membership reallo-
cation q
i
deviating from p
A
i
= 1 (given memberhips
p
j
, j N\i) yields a cover which is best seen by dis-
tinguishing between 2
N
i
\A and A. Also recall that
w(A) w(A\i) =
B2
A
\2
A\i
µ
w
(B). Again, all mem-
bership mass
B2
N
i
\A
q
B
i
> 0 simply collapses on sin-
gleton {i} because
jB\i
p
B
j
= 0 for all B 2
N
i
\A.
Therefore, W (p) W (q
i
|p
i
) = w(A) w({i})+
q
A
i
B2
A
\2
A\i
:|B|>1
µ
w
(B) +
B
0
2
A\i
µ
w
(B
0
)
=
=
p
A
i
q
A
i
B2
A
\2
A\i
:|B|>1
µ
w
(B).
Now assume that q is not a local maximizer, i.e.
W (p) W (q
i
|p
i
) < 0. Since p
A
i
q
A
i
> 0 (because
p
A
i
= 1 and q
i
i
is a deviation from p
i
), then
B2
A
\2
A\i
:|B|>1
µ
w
(B) = w(A) w(A\i) w({i}) < 0.
Hence q cannot be the output of Second Loop.
In local search methods, the chosen initial candi-
tate solution determines what neighborhoods shall be
visited. The range of the objective function in a neigh-
borhood is a set of real values. In a neighborhood
N (p) of a hard clustering p or partition P only those
AP:|A|>1
|A| data points i A in non-sigleton blocks
A P,|A| > 1 can modify global score by reallocat-
ing their membership. In view of the above proof, the
only admissible variations obtain by deviating from
p
A
i
= 1 with an alternative membership distribution q
i
such that q
A
i
[0, 1), with W (q
i
|p
i
) W (p) equal
to (q
A
i
1)
B2
A
\2
A\i
,|B|>1
µ
w
(B) + (1 q
A
i
)w({i}).
Hence, choosing partitions as initial candidate solu-
tions of LOCALSEARCH is evidently poor. A sen-
sible choice should conversely allow the search to
explore different neighborhoods where the objective
function may range widely. A simplest example of
such an initial candidate solution is uniform distribu-
tion q
A
i
= 2
1n
. On the other hand, the input of local
search fuzzy clustering algorithms is commonly de-
sired to be close to a global optimum, i.e. a max-
imizer in the present setting. This translates here
into the idea of defining a suitable input by means
of cluster score function w. Along this line, consider
q
A
i
= w(A)/
B2
N
i
w(B), yielding
q
A
i
q
B
i
=
w(A)
w(B)
=
q
A
j
q
B
j
for
all A,B 2
N
i
2
N
j
and all i, j N.
With a suitable initial candidate solution, the
search may be restricted to explore only a maximum
number of fuzzy clusterings, thereby containing (to-
gether with the quadratic MLE of cluster score w) the
computational burden. In particular, if q(0) is the
finest partition {{1},.. .,{n}} or q
{i}
i
(0) = 1 for all
i N, then the search does not explore any neighbor-
hood at all, and such an input coincides with the out-
put. More reasonably, let A
max
q
= {A
1
,. .., A
k
} denote
the collection of maximal data subsets where input
memberships are strictly positive. That is, q
A
k
0
i
> 0
for all i A
k
0
,1 k
0
k as well as q
B
j
= 0 for all
B 2
N
\
2
A
1
· ·· 2
A
k
and all j B. Then, the out-
put shall be a partition P each of whose blocks A P
satisfies A A
k
0
for some 1 k
0
k. Hence, by suit-
ably choosing the input q, LOCALSEARCH outputs
Multilinear Objective Function-based Clustering
147
a partition with no less than some maximum desired
number k(q) blocks.
6 CONCLUSIONS
This paper approaches objective function-based fuzzy
clustering by firstly eliciting a real-valued cluster
score function, quantifying the positive worth of data
subsets in the given classification problem. Cluster-
ing is next interpreted in terms of combinatorial opti-
mization via set partitioning. The proposed gradient-
based local search relies on a novel expansion of the
MLE of near-Boolean functions (Rossi, 2015) over
the product of n simplices, each of which is 2
n1
1-
dimensional, n being the number of data. The method
needs not the input to specify a desiderd number of
clusters, as this latter is determined autonomously
through optimization, and applies to any classification
problem, handling data sets not necessarily included
in a Euclidean space: proximities between data points
and within clusters may be quantified in any conceiv-
able way, including information theoretic measure-
ment (Pirr
´
o and Euzenat, 2010).
REFERENCES
Aigner, M. (1997). Combinatorial Theory. Springer.
Reprint of the 1979 Edition.
Bensaid, A., Hall, L., Bezdek, J., Clarke, L., Silbiger,
M., Arrington, J., and Murtagh, R. (1996). Validity-
guided (re)clustering with applications to image seg-
mentation. IEEE Trans. on Fuzzy Sys., 4(2):112–123.
Bezdek, J. and Pal, S. (1992). Fuzzy Models for Pattern
Recognition. IEEE Press.
Boros, E. and Hammer, P. (2002). Pseudo-Boolean opti-
mization. Discrete App. Math., 123:155–225.
Cottrell, M., Hammer, B., Hasenfuß, A., and Villmann, T.
(2006). Batch and median neural gas. Neural Net-
works, 19(6-7):762–771.
Crama, Y. and Hammer, P. L. (2011). Boolean Functions:
Theory, Algorithms, and Applications. Cambridge
University Press.
Du, K.-L. (2010). Clustering: a neural network approach.
Neural Networks, 23:89–107.
Kashef, R. and Kamel, M. S. (2010). Cooperative cluster-
ing. Pattern Recognition, 43(6):2315–2329.
Korte, B. and Vygen, J. (2002). Combinatorial Optimiza-
tion. Theory and Algorithms. Springer.
Krishnapuram, R. and Keller, J. (1996). The possibilis-
tic c-means algorithm: insights and recommendations.
IEEE Transactions on Fuzzy Systems, 4(3):148–158.
Lughofer, E. (2008). Extensions of vector quantization for
incremental clustering. Pattern Recognition, 41:995–
1011.
Mas-Colell, A., Whinston, M. D., and Green, J. R. (1995).
Microeconomic Theory. Oxford University Press.
M
´
enard, M. and Eboueya, M. (2002). Extreme physical
information and objective function in fuzzy clustering.
Fuzzy Sets and Systems, 128:285–303.
Ng, A. Y., Jordan, M. I., and Weiss, Y. (2002). On spectral
clustering: analysis and an algorithm. In Dietterich,
T. G., Becker, S., and Ghahramani, Z., editors, Ad-
vances in Neural Information Processing Systems 14,
volume 2, pages 849–856. MIT Press.
Pal, N. and Bezdek, J. (1995). On cluster validity for the
fuzzy c-means model. IEEE Transactions on Fuzzy
Systems, 3(3):370–379.
Pardalos, P., Prokopyev, O., and Busygin, S. (2006). Con-
tinuous approaches for solving discrete optimization
problems. In Appa, G., Pitsoulis, L., and Williams,
H., editors, Handbook on Modeling for Discrete Opti-
mization, pages 39–60. Springer.
Pirr
´
o, G. and Euzenat, J. (2010). A feature and information
theoretic framework for semantic similarity and relat-
edness. In Proceedings of The Semantic Web Confer-
ence ISWC 2010, pages 615–630. LNCS 6496.
Rezaee, M., Lelieveldt, B., and Reiber, J. (1998). A new
cluster validity index for the fuzzy c-means. Pattern
Recognition Letters, 19:237–246.
Rossi, G. (2015). Continuous set packing problems and
near-Boolean functions. arXiv 1509.07986v1. Sub-
mitted to ICPRAM 2016.
Rota, G.-C. (1964). On the foundations of combinatorial
theory I: theory of M
¨
obius functions. Z. Wahrschein-
lichkeitsrechnung u. verw. Geb., 2:340–368.
Roubens, M. (1982). Fuzzy clustering algorithms and their
cluster validity. European Journal of Operational Re-
search, 10(3):294–301.
Sandholm, T. (2002). Algorithm for optimal winner deter-
mination in combinatorial auctions. Artificial Intelli-
gence, (135):1–54.
Schaeffer, S. E. (2007). Graph clustering. Computer Sci-
ence Review, 1(1):27–64.
Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Method
for Pattern Analysis. Cambridge University Press.
Slonim, N., Atwal, S. G., Tka ˇcik, G., and Bialek, W. (2005).
Information-based clustering. PNAS, 102(51):18297–
18302.
Valente de Oliveira, J. and Pedrycz, W. (2007). Advances in
fuzzy clustering and its applications. Wiley.
von Luxburg, U., Belkin, M., and Bousquet, O. (2008).
Consistency of spectral clustering. The Annals of
Statistics, 36(2):555–586.
Wang, W. and Zhang, Y. (2007). On fuzzy cluster validity
indices. Fuzzy Sets and Systems, 158:2095–2117.
Wu, S. and Chow, T. W. S. (2004). Clustering of the self-
organizing map using a cluster validity index based on
inter-cluster and intra-cluster density. Pattern Recog-
nition, 37:175–188.
Xie, X. and Beni, G. (1991). Validity measure for fuzzy
clustering. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 13(8):841–847.
FCTA 2015 - 7th International Conference on Fuzzy Computation Theory and Applications
148
Xu, R. and Wunsch, D. (2005). Survey of clustering algo-
rithms. IEEE Trans. on Neural Net., 16(3):645–678.
Zahid, N., Abouelala, O., Limouri, M., and Essaid,
A. (2001). Fuzzy clustering based on k-nearest-
neighbours rule. Fuzzy Sets and Sys., 120:239–247.
Multilinear Objective Function-based Clustering
149