CLASSIFICATION USING HIGH ORDER DISSIMILARITIES

IN NON-EUCLIDEAN SPACES

Helena Aidos

, Ana Fred

and Robert P. W. Duin

Instituto de Telecomunicac¸˜oes, Instituto Superior T´ecnico, Lisbon, Portugal

Faculty of Electrical Engineering, Mathematics and Computer Sciences, Delft University of Technology,

Delft, The Netherlands

Keywords:

Dissimilarity increments, Maximum a posteriori, Classiﬁcation, Gaussian mixture, Non-Euclidean space.

Abstract:

This paper introduces a novel classiﬁcation algorithm named MAP-DID. This algorithm combines a maximum

a posteriori (MAP) approach using the well-known Gaussian Mixture Model (GMM) method with a term that

forces the various Gaussian components within each class to have a common structure. That structure is

based on higher-order statistics of the data, through the use of the dissimilarity increments distribution (DID),

which contains information regarding the triplets of neighbor points in the data, as opposed to typical pairwise

measures, such as the Euclidean distance. We study the performance of MAP-DID on several synthetic and real

datasets and on various non-Euclidean spaces. The results show that MAP-DID outperforms other classiﬁers

and is therefore appropriate for classiﬁcation of data on such spaces.

1 INTRODUCTION

Classiﬁcation deals with algorithmic methodologies

for assigning a new input data to one of the known

classes. There are numerous classiﬁers with differ-

ent strategies, like k nearest neighbor, neural net-

works, support vector machines, Parzen windows

(Duda et al., 2001; Hastie et al., 2009).

This paper introduces a new maximum a posteri-

ori (MAP) classiﬁer based on the Gaussian Mixture

Model (GMM). This novel classiﬁer (MAP-DID) in-

troduces an extra factor on the likelihood containing

information about higher-order statistics of the data,

through the use of the distribution of their dissimilar-

ity increments (Aidos and Fred, 2011).

2 DISSIMILARITY

REPRESENTATIONS

Sometimes it is useful to describe the objects using a

dissimilarity representation, a square matrix contain-

ing the dissimilarities between all pairs of objects. To

use the typical classiﬁers, we need to build a vector

space based on the relations given by the dissimilar-

ity matrix. In (Duin and Pekalska, 2008), two strate-

gies are considered to obtain vector spaces: pseudo-

Euclidean spaces and dissimilarity spaces.

2.1 Pseudo-Euclidean Spaces (PES)

The PES is given by the Cartesian product of two

real spaces: E = R

×R

. A vector x ∈ E is rep-

resented as an ordered pair of two real vectors: x =

−

). This space is equipped with a pseudo-inner

product, such that hx, yi

= x

y, where J

p×p

0;0 −I

q×q

]. Alternatively, if x

and x

−

rep-

resent the components of x

and x

−

, then hx,yi

∑

i=1

−

∑

i=1

−

Although this pseudo-inner product is symmetric

and linear in its ﬁrst argument, it is not positive deﬁ-

nite. Thus, if one constructs the Gram matrix, G, from

the data patterns x

as G

= x

, then G may not be

positive semideﬁnite in the PES (Pekalska, 2005). G

is symmetric in the PES, so it has an eigendecompo-

sition of G = VDV

; but, its eigenvalues can be neg-

ative. Note that a new dataset can be built up from

G through X = V|D|

1/2

, where matrix X contains the

vector representations of the new patterns in the PES.

In (Duin et al., 2008; Duin and Pekalska, 2008),

several variants of PES are considered. In this paper,

we also consider the following spaces.

• Pseudo-Euclidean Space (PES): This is a (p +

q)-dimensional PES deﬁned by p + q eigenvec-

tors. One keeps the p largest positive eigenvalues

306

Aidos H., Fred A. and P. W. Duin R. (2012).

CLASSIFICATION USING HIGH ORDER DISSIMILARITIES IN NON-EUCLIDEAN SPACES.

In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, pages 306-309

DOI: 10.5220/0003779503060309

 SciTePress

and the q negative eigenvalues that have the high-

est absolute value. Each direction is scaled by the

magnitude of the corresponding eigenvalue.

• Positive Pseudo-Euclidean Space (PPES): This

p-dimensional space is deﬁned as PES, but only

the p largest positive eigenvalues are kept.

• Negative Pseudo-Euclidean Space (NPES): This

q-dimensional space is deﬁned as PES, but only

the q largest negative eigenvalues (in magnitude)

are kept; no positive eigenvalues are used.

• Corrected Euclidean Space (CES): In CES, a

constant is added to all the eigenvalues (positive

and negative) to ensure that they all become pos-

itive. This constant is given by 2|a|, where a is

the negative eigenvalue with the largest absolute

value.

2.2 Dissimilarity Spaces (DS)

We consider four more spaces constructed in the fol-

lowing way: we compute the pairwise Euclidean dis-

tances between data points of one of the spaces de-

ﬁned above. These distances are new feature repre-

sentations of x

. Note that the dimension of the fea-

ture space is equal to the number of points.

Since our classiﬁer suffers from the curse of di-

mensionality, we must reduce the number of features;

there are several techniques for that (Hastie et al.,

2009). We chose k-means to ﬁnd a number of pro-

totypes k < N. k is selected as a certain percent-

age of N/2, and the algorithm is initialized in a de-

terministic way as described in (Su and Dy, 2007).

After the k prototypes are found, the distances from

each point x

to each of these prototypes are used

as their new feature representations. This deﬁnes

four new spaces, which are named as Dissimilarity

Pseudo-Euclidean Space (DPES), Dissimilarity Posi-

tive Pseudo-Euclidean Space (DPPES), Dissimilarity

Negative Pseudo-Euclidean Space (DNPES) and Dis-

similarity Corrected Euclidean Space (DCES).

3 THE MAP-DID ALGORITHM

In this section, dissimilarities between patterns in the

eight previously deﬁned spaces are computed as Eu-

clidean distances.

3.1 Dissimilarity Increments

Distribution (DID)

Let X be a set of patterns, and (x

) a triplet

of nearest neighbors belonging to X, where x

is the

nearest neighbor of x

and x

is the nearest neighbor

of x

, different from x

. The dissimilarity increment

(DI) (Fred and Leit˜ao, 2003) between these patterns

is deﬁned as d

inc

) =



d(x

) −d(x

)



This measure contains information different from a

distance: the latter is a pairwise measure, while the

former is a measure for a triplet of points, thus a mea-

sure of higher-order dissimilarity of the data.

In (Aidos and Fred, 2011) the DIs distribution

(DID) was derived under the hypothesis of Gaussian

distribution of the data and it was written as a function

of the mean value of the DIs, λ. Therefore, the DID

of a class is given by

inc

(w;λ) =

πβ

4λ

wexp



−

πβ

4λ



√

2λ



4λ

πβ

−w



exp



−

πβ

8λ



erfc



√

πβ

√

2λ



, (1)

where erfc(·) is the complementary error function,

and β = 2−

√

3.2 MAP-DID

Consider that {x

,inc

}

i=1

is our dataset, where x

is a feature vector in R

, c

is the class label and inc

is the set of increments yielded by all the triplets of

points containing x

. We assume that a class c

has

a single statistical model for the increments, with an

associated parameter λ

. This DID, described above,

can be seen as high-order statistics of the data since it

has information of a third order dissimilarity of data.

For example, we generate a 2-dimensional Gaus-

sian with 1000 points; it has zero mean and covari-

ance the identity matrix (ﬁgure 1 left). We also

generate a 2-dimensional dataset with 1000 points,

where 996 points are in the center and there are four

off-center points at coordinates (±a, 0) and (0,±a),

where a is such that the covariance is also the identity

matrix (ﬁgure 1 right). We compute the DIs for each

dataset and look at their histograms (ﬁgure 1).

Although the datasets have the same mean and co-

variancematrix, the two DIs distributionsare very dif-

ferent from each other. Therefore, the DIs can be seen

as a measure of higher-order statistics: the two dis-

tributions under consideration have exactly the same

mean and variance, but their DIDs are vastly different.

So, we design a maximum a posteriori (MAP)

classiﬁer that combines the Gaussian Mixture Model

(GMM) and the information given by the increments,

assuming that x

and inc

are conditionally indepen-

dent given c

. We used a prior given by p(c

) =

|/N, with |c

| the number of points of class j, and

the likelihood p(x

,inc

) = p(x

)p(inc

CLASSIFICATION USING HIGH ORDER DISSIMILARITIES IN NON-EUCLIDEAN SPACES

307

−25 −20 −15 −10 −5 0 5 10 15 20 25

−25

−20

−15

−10

−5

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

100

120

140

160

180

200

0 5 10 15 20 25

100

200

300

400

500

600

700

800

900

1000

−4 −3 −2 −1 0 1 2 3 4

−4

−3

−2

−1

Figure 1: Two simple datasets with zero mean and a covari-

ance given by the identity matrix, but with vastly different

DIs. Left: Gaussian data. Right: dataset with 996 points

at the origin and four off-center points. Corresponding his-

tograms of the DIs. Note that in the right histogram there

are four non-zero increments and 996 zero increments.

The class-conditional density of the vector x

fol-

lows a GMM given by p(x

) =

∑

l=1

p(x

|Σ

,µ

with K the number of Gaussian components deter-

mined for class c

, α

the weight of each Gaussian

component and p(x

|Σ

,µ

) the Gaussian distribution.

We obtained the parameters α

, Σ

and µ

using the

GMM described in (Figueiredo and Jain, 2002).

The class-conditional density of the set of in-

crements where x

belongs is given by p(inc

) =

∑

n=1

p(inc

), where M is the number of incre-

ments of the set inc

, inc

is the n-th increment of that

set, and p(inc

) = p(inc

|λ

) is the DID given in

equation (1). We thus consider a statistical model for

increments with parameter λ

for each class.

4 EXPERIMENTAL RESULTS

AND DISCUSSION

In this section we compare MAP-DID to other classi-

ﬁers (1-nearest neighbor (1-NN), nearest-mean (NM),

Parzen window and a linear support vector machine

(SVM)). We use 13 datasets, of which 2 are synthetic

and 11 are real-world data

For each of the classiﬁers, we use a 10-fold cross-

validation scheme to estimate classiﬁer performance.

Figures 2 and 3 present the results for the average

classiﬁcation error. The values of p and q eigenvec-

tors, and k prototypes, used to construct the spaces

described in Section 2, are in Table 1.

The MAP-DID is the algorithm with the lowest

error rate. This is true for the vast majority of all

the possible dataset-space pairs. Thus, if any of these

See http://prtools.org/disdatasets/ for a full description

of the datasets and the MATLAB toolboxes containing the

classiﬁers used for comparison.

Table 1: Number of eigenvectors and prototypes used to

construct the spaces described in Section 2.

Dataset p q k

Balls3d 3 7 10

Balls50d 18 5 18

CatCortex 2 2 2

CoilDelftSame 8 4 7

CoilYork 8 5 7

DelftGestures 11 2 13

Protein 6 2 5

Zongker 14 3 20

Chickenpieces 8 3 9

Figure 2: Classiﬁcation error rate on the four pseudo-

Euclidean spaces considered in Section 2.1.

spaces are to be used for classiﬁcation, the MAP-DID

is a good choice for classiﬁcation algorithm.

Some other interesting points should be empha-

sized: for the real-world datasets it is interesting to

note that the results are not very different between the

PES, PPES and CES spaces, all of which take into ac-

count the positive portion of the space. Conversely,

the NPES results are considerably worse than those

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

308

Figure 3: Classiﬁcation error rate on the four dissimilarity

spaces considered in Section 2.2.

three, which indicates that this negative space con-

tains little information for classiﬁcation purposes.

Another interesting point is that in the dissimilar-

ity spaces (ﬁgure 3), neither the positive (DPPES) nor

the negative (DNPES) spaces contain all the informa-

tion; instead, the union of the information contained

in those two spaces (DPES or DCES) yields much

better results than either of them separately.

It was necessary to reduce the dimensionality of

the data to generate the dissimilarity spaces (ﬁgure 3).

This reduction was accomplished through k-means,

by computing the distances from the data patterns to

the estimated prototypes. However, many other tech-

niques could be used for dimensionality reduction,

and it is possible that some of those techniques would

yield an improvement on the results for these spaces.

One aspect not considered here is the metric-

ity and euclideaness of datasets (Duin and Pekalska,

2008). These properties may help us identify the sit-

uations where MAP-DID performs well.

5 CONCLUSIONS

We have presented a novel maximum a posteriori

(MAP) classiﬁer which uses the dissimilarity incre-

ments distribution (DID). This classiﬁer, called MAP-

DID, can be interpreted as a Gaussian Mixture Model

with an operator that forces a class to have a com-

mon increment structure, even though each Gaussian

component within a class can have distinct means and

covariances. Experimental results have shown that

MAP-DID outperforms multiple other classiﬁers on

various datasets and feature spaces.

In this paper we focused on Euclidean spaces de-

rived from non-Euclidean data. This might suggest

that MAP-DID could perform well when applied to

originally Euclidean data. This is a topic which will

receive more investigation in the future.

ACKNOWLEDGEMENTS

This work was supported by the FET programme

within the EU FP7, under the SIMBAD project con-

tract 213250; and partially by the Portuguese Foun-

dation for Science and Technology, scholarship num-

ber SFRH/BD/39642/2007, and grant PTDC/EIA-

CCO/103230/2008.

REFERENCES

Aidos, H. and Fred, A. (2011). On the distribution of dis-

similarity increments. In IbPRIA, pages 192–199.

Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern

Classiﬁcation. John Wiley & Sons Inc., 2nd edition.

Duin, R., Pekalska, E., Harol, A., Lee, W.-J., and Bunke, H.

(2008). On euclidean corrections for non-euclidean

dissimilarities. In SSPR/SPR, pages 551–561.

Duin, R. P. and Pekalska, E. (2008). On reﬁning dissimilar-

ity matrices for an improved nn learning. In ICPR.

Figueiredo, M. and Jain, A. (2002). Unsupervised learn-

ing of ﬁnite mixture models. IEEE Trans. on Pattern

Analysis and Machine Intelligence, 24(3):381–396.

Fred, A. and Leit˜ao, J. (2003). A new cluster isolation

criterion based on dissimilarity increments. IEEE

Trans. on Pattern Analysis and Machine Intelligence,

25(8):944–958.

Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Ele-

ments of Statistical Learning: Data Mining, Inference,

and Prediction. Springer, 2nd edition.

Pekalska, E. (2005). Dissimilarity Representations in Pat-

tern Recognition: Concepts, Theory and Applications.

PhD thesis, Delft University of Technology, Delft,

Netherland.

Su, T. and Dy, J. G. (2007). In search of deterministic

methods for initializing k-means and gaussian mixture

clustering. Intelligent Data Analysis, 11(4):319–338.

CLASSIFICATION USING HIGH ORDER DISSIMILARITIES IN NON-EUCLIDEAN SPACES

309