ON REDUCING DIMENSIONALITY OF DISSIMILARITY

MATRICES FOR OPTIMIZING DBC

An Experimental Comparison

∗

Sang-Woon Kim

Department of Computer Science and Engineering, Myongji University, Yongin, 449-728 South Korea

Keywords:

Dissimilarity-based Classiﬁcations, Dimensionality reduction schemes, Prototype selection methods.

Abstract:

One problem of dissimilarity-based classiﬁcations (DBCs) is the high dimensionality of dissimilarity matri-

ces. To address this problem, two kinds of solutions have been proposed in the literature: prototype selection

(PS) based methods and dimensionality reduction (DR) based methods. The DR-based method consists of

building the dissimilarity matrices using all the available training samples and subsequently applying some of

the standard DR schemes. On the other hand, the PS-based method works by directly choosing a small set of

representatives from the training samples. Although DR-based and PS-based methods have been explored sep-

arately by many researchers, not much analysis has been done on the study of comparing the two. Therefore,

this paper aims to ﬁnd a suitable method for optimizing DBCs by a comparative study. In the experiments, four

DR and four PS methods are used to reduce the dimensionality of the dissimilarity matrices, and classiﬁcation

accuracies of the resultant DBCs trained with two real-life benchmark databases are analyzed. Our empirical

evaluation on the two approaches demonstrates that the DR-based method can improve the classiﬁcation ac-

curacies more than the PS-based method. Especially, the experimental results show that the DR-based method

is clearly more useful for nonparametric classiﬁers, but not for parametric ones.

1 INTRODUCTION

Dissimilarity-based classiﬁcations (DBCs) (Pekalska

and Duin, 2005), (Pekalska and Paclik, 2006) are a

way of deﬁning classiﬁers among the classes, and

the process is not based on the feature measurements

of individual object samples, but rather on a suitable

dissimilarity measure among the individual samples.

The advantage of this method is that it can avoid the

problems associated with feature spaces, such as the

curse of dimensionality and the issue of estimating a

number of parameters (Kim and Oommen, 2007).

In DBCs, a good selection of prototypes seems to

be crucial to succeed with the classiﬁcation algorithm

in the dissimilarity space. The prototypes should

avoid redundancies in terms of selection of similar

samples, and prototypes should include as much in-

formation as possible. However, it is difﬁcult for us to

ﬁnd the optimal number of prototypes. Furthermore,

there is a possibility that we lose some useful informa-

∗

This work was supported by the National Research

Foundation of Korea funded by the Korean Government

(NRF-2009-0071283).

tion for discrimination when selecting the prototypes.

To avoid these problems, in (Bicego and Figueiredo,

2004), (Riesen and Bunke, 2007), and (Kim and Gao,

2008), the authors separately proposed an alternative

approach where all of the available samples were se-

lected as prototypes, and, subsequently, a scheme,

such as linear discriminant analysis, was applied to

the reduction of dimensionality. This approach is

more principled and allows us to completely avoid the

problem of ﬁnding the optimal number of prototypes

(Bunke and Riesen, 2007).

In this paper, we perform an empirical evaluation

on the two approaches of reducing the dimensionality

of dissimilarity matrices for optimizing DBCs: pro-

totype selection (PS) based methods and dimension

reduction (DR) based methods. In PS-based meth-

ods, we ﬁrst select the representative prototype subset

from the training data set by resorting to one of the

prototype selection methods as described in (Pekalska

and Duin, 2005) and (Pekalska and Paclik, 2006).

Then, we compute the dissimilarity matrix, in which

each individual dissimilarity is computed on the ba-

sis of the measures described in (Pekalska and Paclik,

235

Kim S. (2010).

ON REDUCING DIMENSIONALITY OF DISSIMILARITY MATRICES FOR OPTIMIZING DBC - An Experimental Comparison.

In Proceedings of the 2nd International Conference on Agents and Artiﬁcial Intelligence - Artiﬁcial Intelligence, pages 235-240

DOI: 10.5220/0002713002350240

 SciTePress

2006). In addition, for a testing sample, z, we com-

pute a dissimilarity column vector, δ(z), by using the

same measure. Finally, we perform the classiﬁcation

by invoking a classiﬁer built in the dissimilarity space

and by operating the classiﬁer on δ(z).

On the other hand, in DR-based methods, we pre-

fer not to directly select the prototypes from the train-

ing samples; rather, we employ a way of using a stan-

dard DR scheme, after computing the dissimilarity

matrix with the entire training samples. Then, as in

PS-based methods, we compute a dissimilarity col-

umn vector for a testing sample and perform the clas-

siﬁcation of the vector by invoking a classiﬁer built

in the dissimilarity space. Here, the point to be men-

tioned is how to choose the optimal number of proto-

types and the subspace dimensions to be reduced. In

PS-based methods, we select the same number of (or

twice as many) prototypes as the number of classes in

heuristic. In DR-based ones, on the other hand, we

can use a cumulative proportion technique (Laakso-

nen and Oja, 1996) to choose the dimensions.

The main contribution of this paper is to present an

empirical evaluation on the two methods of reducing

the dimensionality of dissimilarity matrices for opti-

mizing DBCs. This evaluation shows that DBCs can

be optimized by employing a dimensionality reduc-

tion scheme as well as a prototype selection method.

Here, the aim of using the dimensionality reduction

scheme instead of selecting the prototypes is to ac-

commodate some useful information for discrimina-

tion and to avoid the problem of ﬁnding the opti-

mal number of prototypes. Our experimental results

demonstrate that the DR-based method can generally

improve the classiﬁcation accuracy of DBCs more

than the prototype selection based method. Espe-

cially, the results indicate that the DR-based method

is clearly more useful for nonparametric classiﬁers,

but not for parametric ones.

2 RELATED WORK

Foundations of DBCs. A dissimilarity representa-

tion of a set of samples, T = {x

}

i=1

∈ ℜ

, is based

on pairwise comparisons and is expressed, for exam-

ple, as an n× m dissimilarity matrix D

T,Y

[·,·], where

Y = {y

}

j=1

, a prototype set, is extracted from T, and

the subscripts of D represent the set of elements, on

which the dissimilarities are evaluated. Thus, each

entry, D

T,Y

[i, j], corresponds to the dissimilarity be-

tween the pairs of objects, hx

i, where x

∈ T and

∈ Y. Consequently, an object, x

, is represented as

a column vector as follows:

[d(x

),d(x

),··· ,d(x

)]

,1 ≤ i ≤ n. (1)

Here, the dissimilarity matrix, D

T,Y

[·,·], is deﬁned

as a dissimilarity space, on which the d-dimensional

object, x, given in the feature space, is represented

as an m-dimensional vector, δ(x,Y), where if x = x

δ(x

,Y) is the i-th row of D

T,Y

[·,·]. In this paper, the

column vector, δ(x,Y), is simply denoted by δ(x).

Prototype Selection Methods. The intention of se-

lecting prototypes is to guarantee a good tradeoff

between the recognition accuracy and the computa-

tional complexity when the DBC is built on D

T,Y

[·,·]

rather than D

T,T

[·,·]. Various prototype selection

(PS) methods have been proposed in the literature

(Loz, ), (Pekalska and Duin, 2005), (Pekalska and

Paclik, 2006). The well-known eight selection meth-

ods experimented in (Pekalska and Duin, 2005) and

(Pekalska and Paclik, 2006) are Random, Random C,

KCentres, ModeSeek, LinProg, PeatSeal, KCentres-

LP, and EdiCon. In the interest of compactness, the

details of these methods are omitted here, but can be

found in the existing literature (Pekalska and Paclik,

2006).

DBCs summarized previously, in which the repre-

sentative prototype subset is selected with a PS, are

referred to as PS-based DBCs or simply PS-based

methods. An algorithm for PS-based DBCs is sum-

marized in the following:

1. Select the representative set, Y, from the train-

ing set, T, by resorting to one of the prototype selec-

tion methods.

2. Using Eq. (1), compute the dissimilarity ma-

trix, D

T,Y

[·,·], in which each individual dissimilarity

is computed on the basis of the measures described in

(Pekalska and Duin, 2005).

3. For a testing sample z, compute δ(z) by using

the same measure used in Step 2.

4. Achieve the classiﬁcation by invoking a clas-

siﬁer built in the dissimilarity space and by operating

the classiﬁer on the dissimilarity vector, δ(z).

From these four steps, we can see that the perfor-

mance of the DBCs relies heavily on how well the dis-

similarity space, which is determined by the dissimi-

larity matrix, D

T,Y

[·,·], is constructed. To improvethe

performance, we need to ensure that the dissimilarity

matrix is well designed.

Dimensionality Reduction Schemes. With regard to

reducing the dimensionality of the dissimilarity ma-

trix, we can use a strategy of employing the dimen-

sionality reduction (DR) schemes after computing

the dissimilarity matrix with the entire training sam-

ples. Numerous DRs have been proposed in the liter-

ature, some of which are (Belhumeour and Kriegman,

1997), (Yu and Yang, 2001), (Loog and Duin, 2004),

and (Wei and Li, 2009). The most well known DRs

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

236

are the class of linear discriminant analysis (LDA)

strategies, such as Fisher LDA (Yu and Yang, 2001),

Two-stage LDA (Belhumeour and Kriegman, 1997),

Chernoff distance based LDA (Loog and Duin, 2004),

(Rueda and Herrera, 2008), and so on.

In the interest of brevity, the details of the LDA

strategies are again omitted here, but we brieﬂy ex-

plain below the Chernoff distance based LDA (in

short CLDA) that is pertinent to our present study. It

is well-known that LDA is incapable of dealing with

the heteroscedastic data in a proper way (Loog and

Duin, 2004). To overcome this limitation, in CLDA,

the square of Euclidian distance, S

= S

/(p

), is

replaced with the Chernoff distance deﬁned as: S

−

− m

)(m

− m

)

−

+ (logS − p

logS

−

logS

)/(p

), where S

, m

, and p

are the scat-

ter matrix, the mean vector, and a priori probability

of class i, respectively; S = p

+ p

. Using S

instead of S

, the Fisher separation criterion, J

, can

be deﬁned (see Eq. (5) of (Loog and Duin, 2004)).

To obtain a matrix, A, that maximizes J

, recently,

some researchers (Rueda and Herrera, 2008) have

developed a gradient-based algorithm named Cher-

noff LDA Two, which consists of three steps: (a) initi-

ate A

(0)

, (b) compute A

(k+1)

from A

(k)

by applying the

secant method to J

, and (c) terminate the iteration by

checking the convergence (Rueda and Herrera, 2008).

DBCs, in which the dimensionality of dissimilar-

ity matrices is reduced with a DR, are referred to as

DR-based DBCs or DR-based methods. An algorithm

for DR-based DBCs is summarized in the following:

1. Select the entire training samples T as the rep-

resentative set Y.

2. Using Eq. (1), compute the dissimilarity ma-

trix, D

T,T

[·,·], in which each individual dissimilarity

is computed on the basis of the measures described

in (Pekalska and Duin, 2005). After computing the

T,T

[·,·], reduce its dimensionality by invoking a di-

mensionality reduction scheme.

3. This step is the same as in PS-based DBC.

4. This step is the same as in PS-based DBC.

The rationale of this strategy is presented in a later

section together with the experimental results.

In the attempt to provide a comparison between

PS-based DBCs and DR-based DBCs, we are re-

quired to analyze their computational complexities.

In light of brevity, the details of the analysis are omit-

ted here. From analyzing the algorithms, however, we

can observe that the time complexities of PS-based,

LDA (and PCA)-based, and CLDA-based DBCs are

O(nmd), O(n

d), and O(n

d + n

), respectively.

3 EXPERIMENTAL RESULTS

Experimental Data. PS-based and DR-based meth-

ods were tested and compared with each other by con-

ducting experiments for a handprinted character data

set and a well-known face database, namely Nist38

(Wilson and Garris, 1992) and Yale (Georghiades and

Kriegman, 2001). The data set captioned Nist38 con-

sists of two kinds of digits, 3 and 8, for a total of 1000

binary images. The size of each image is 32× 32 pix-

els, for a total dimensionality of 1024 pixels. The

Yale database contains 165 gray scale images of 15

individuals. The size of each image is 243× 320 pix-

els, for a total dimensionality of 77760 pixels. To

reduce the computational complexity of this experi-

ment, facial images of Yale were down-sampled into

178 × 236 pixels and then represented by a centered

vector of normalized intensity values.

Experimental Method. All of our experiments were

performed with a “leave-one-out” strategy; to classify

an image, we removed the image from the training set

and computed the dissimilarity matrix with the n− 1

images. This process was repeated n times for every

image, and a ﬁnal result was obtained by averaging

the results of each image.

To measure the dissimilarity between two objects,

we used Euclidean distance (ED), Hamming distance

(HD), regional distance (RD) (Adini and Ullman,

1997), and spatially weighted gray-level Hausdorff

distance (WD) (Kim, 2006) measuring systems

To construct the dissimilarity matrix, in PS-based

methods, we employed Random (in short Rand), Ran-

dom C (in short RandC), KCentres (in short KCen-

ter), and ModeSeek (in short ModeS) to select the

prototype subset. Here, the number of prototypes se-

lected was heuristically determined as c or 2c.

On the other hand, in DR-based methods, to re-

duce the dimensionality, we used direct LDA (in short

LDA), PCA, two-stage LDA (in short PCALDA), and

Chernoff distance-based LDA (in short CLDA). In

addition, to select the dimensions for the systems,

we used the cumulative proportion, α, which is de-

ﬁned as follows (Laaksonen and Oja, 1996): α(q) =

∑

j=1

∑

j=1

. Here, the subspace dimension,

q, (where d and λ

are the dimensionality and the

eigenvalue, respectively) of the data sets is deter-

mined by considering the cumulativeproportionα(q).

The eigenvectors and eigenvalues are computed, and

the cumulative sum of the eigenvalues is compared

to a ﬁxed number, k. In other words, the subspace

In this experiment, we employed only four measuring

systems, namely ED, HD, RD, and WD. However, other

numerous solutions could also be considered.

ON REDUCING DIMENSIONALITY OF DISSIMILARITY MATRICES FOR OPTIMIZING DBC - An Experimental

Comparison

237

dimensions are selected by considering the relation

α(q) ≤ k ≤ α(q + 1). In PCALDA, however, we re-

duced the dimensions in two steps: we ﬁrst reduced

the dimension d(= n− 1) into an intermediate dimen-

sion n− c+ 1 using PCA; we then reduced the inter-

mediate dimension n− c+ 1 to q using LDA

To maintain the diversityamong the DBCs, we de-

signed different classiﬁers, such as k-nearest neighbor

classiﬁers (k = 1), nearest mean classiﬁers, regular-

ized normal density-based linear/quadratic classiﬁers,

and support vector classiﬁers. All of the DBCs men-

tioned above are implemented with PRTools

and de-

noted in the next section as knnc, nmc, ldc, qdc, and

svc, respectively. Here, ldc and qdc are regularized

with (R, S) = (0.01,0.01). Also, svc is implemented

using a polynomial kernel function of degree 1.

Experimental Results. The run-time characteristics

of the empirical evaluation on the two data sets are re-

ported below and shown in ﬁgures and tables. In this

section, we ﬁrst investigate the rationality of employ-

ing a PS (i.e., KCenter) or a DR(i.e., PCA) methods in

reducing the dimensionality. Then, we present classi-

ﬁcation accuracies of the PS and DR-based methods.

Consequently, based on the classiﬁcation results, we

grade and rank the methods. Finally, we introduce a

numerical comparison of the processing CPU-times.

First, the experimentalresults of PS and DR-based

methods were probed. Fig. 1 shows plots of the clas-

siﬁcation accuracies obtained with knnc for Yale. In

Fig. 1(a), the dissimilarity matrix, in which the classi-

ﬁers were evaluated, was generated with the prototype

subset selected with a PS, such as KCenter. In Fig.

1(b), on the other hand, after generating the dissimi-

larity matrix with the entire data set, the dimensional-

ity was reduced by invoking a DR, such as PCA.

In the ﬁgure, it is interesting to note that PS and

DR-based DBCs (knnc) can be optimized by means of

choosing the number of prototypes and reducing the

dimensions, respectively. For example, both classiﬁ-

cation accuracies of RD for knnc are saturated when

having 16 and 8 as the number of prototypes and the

subspace dimension, respectively. Here, the prob-

lem to be addressed is how to choose the optimal

number of prototypes and the dimension to be re-

duced. In PS-based methods, we selected the num-

ber of prototypes as 2c (which is an experimental pa-

rameter). In DR-based methods, on the other hand,

using the cumulative technique, we chose the sub-

space dimensions for Yale and Nist38 as follows: (1)

Similar to the approaches with prototype selection

methods, the number of dimensions is not given beforehand.

From this point of view, we could say that the problem of

selecting the optimal dimension remains unresolved.

http://prtools.org/

1 2 4 8 16 32 64 128 256

100

knnc

# of prototypes

Classification accuracies (%)

1 2 4 8 16 32 64 128 256

100

knnc

Dimensions

Classification accuracies (%)

Figure 1: Plots of the classiﬁcation accuracy rates (%) of

PS-based and DR-based DBCs (knnc) for Yale database: (a)

top (PS-based DBC); (b) bottom (DR-based DBC). Here,

the prototype subsets of (a) are selected from the training

data set with KCenter and the subspace dimensions of (b)

are obtained (selected) with a PCA.

Yale: q

= 12; q

= 15; q

= 76; q

= 133, (2)

Nist38: q

= 133; q

= 22; q

= 41; q

= 41.

Although it is hard to quantitatively evaluate the

various PS and DR-based methods, we have at-

tempted to do exactly this; we have given a numer-

ical grade to every method tested here according to

its classiﬁcation accuracy to render this comparative

study more complete. Tables 1 and 2 show, respec-

tively, the classiﬁcation accuracies of PS and DR-

based DBCs, where the values underlined are the

highest ranks in the four accuracies of each classiﬁer.

From the two tables, we can see that almost all the

highest rates (underlined) achieved with DR-based

DBCs are higher than those of PS-based ones. This

observation conﬁrms the possibility that the classiﬁ-

cation performance of DBCs can be improved by ef-

fectively reducing the dimensionality after construct-

ing dissimilarity matrices with all of the training sam-

ples. To observe how well the methods work, we

picked the best three among the eight (four of PSs and

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

238

Table 1: Classiﬁcation accuracies (%) of PS-based DBCs.

data methods knnc nmc ldc qdc svc

Rand 80.00 77.58 84.24 75.76 84.85

ED RandC 80.61 81.82 96.97 79.39 56.97

KCenter 77.58 80.00 89.70 75.76 85.45

ModeS 78.18 78.79 88.48 76.36 89.09

Rand 75.15 73.94 71.52 76.36 76.36

Yale HD RandC 75.76 80.00 89.70 73.33 75.76

KCenter 73.94 74.55 76.97 75.15 80.61

ModeS 75.15 78.18 75.76 80.00 87.88

Rand 77.58 71.52 90.91 76.36 86.06

RD RandC 78.79 70.91 98.79 75.15 86.67

KCenter 78.79 76.97 96.97 76.36 86.67

ModeS 79.39 74.55 97.58 76.36 89.09

Rand 72.12 52.12 80.00 71.52 78.79

WD RandC 71.52 49.09 79.39 70.91 76.36

KCenter 74.55 49.70 79.39 70.30 76.36

ModeS 71.52 49.09 76.97 69.70 74.55

Rand 79.10 77.10 80.80 82.00 71.10

ED RandC 90.40 84.90 90.00 91.00 89.10

KCenter 80.00 79.50 83.60 84.80 76.80

ModeS 97.40 85.40 97.80 99.30 96.50

Rand 80.70 80.20 82.70 84.00 81.70

Nist38 HD RandC 90.00 82.00 88.50 90.00 86.80

KCenter 81.30 78.30 83.00 84.00 73.60

ModeS 97.40 85.40 97.80 99.30 97.80

Rand 84.50 80.70 86.30 85.70 0

RD RandC 93.80 85.10 91.60 93.00 0

KCenter 90.80 86.40 91.10 91.40 90.90

ModeS 97.00 86.90 97.80 98.90 97.70

Rand 78.90 70.50 77.10 78.50 75.10

WD RandC 87.40 73.90 84.40 86.60 80.80

KCenter 79.90 74.70 79.70 81.80 64.00

ModeS 93.70 77.50 95.50 96.80 95.20

Table 2: Classiﬁcation accuracies (%) of DR-based DBCs.

data methods knnc nmc ldc qdc svc

LDA 89.70 93.94 93.94 83.03 89.70

ED PCA 79.39 80.61 93.94 79.39 46.67

PCALDA 90.30 92.73 89.09 83.64 96.97

CLDA 89.70 89.70 91.52 82.42 93.33

LDA 82.42 81.21 81.21 79.39 84.85

Yale HD PCA 75.76 77.58 86.67 78.79 89.70

PCALDA 79.39 84.24 80.00 72.12 87.88

CLDA 81.21 81.21 81.21 76.36 89.70

LDA 90.91 96.97 96.97 84.24 94.55

RD PCA 79.39 75.15 96.97 77.58 86.06

PCALDA 98.79 98.79 93.33 89.70 99.39

CLDA 89.09 89.09 87.88 71.52 81.21

LDA 80.00 83.03 83.03 73.33 77.58

WD PCA 70.91 49.70 72.73 70.91 73.33

PCALDA 75.76 76.97 70.91 60.61 63.03

CLDA 64.24 64.24 60.00 48.48 42.42

LDA 84.80 87.50 87.50 87.80 87.00

ED PCA 98.10 87.50 98.00 99.30 96.60

PCALDA 71.50 71.60 71.60 70.80 71.40

CLDA 63.80 63.80 63.80 80.10 63.80

LDA 84.80 87.50 87.50 87.80 87.00

Nist38 HD PCA 98.10 87.50 98.00 99.30 97.70

PCALDA 71.50 71.60 71.60 70.80 0

CLDA 63.80 63.80 63.80 80.10 63.80

LDA 86.90 88.30 88.20 88.90 85.60

RD PCA 97.30 88.20 97.10 98.60 0

PCALDA 58.10 58.10 58.10 58.00 0

CLDA 50.40 50.40 49.10 49.50 50.40

LDA 68.00 76.90 76.90 77.30 76.20

WD PCA 92.50 76.90 97.00 97.00 96.30

PCALDA 55.80 55.80 55.80 55.70 0

CLDA 52.70 52.70 50.10 59.60 52.70

four of DRs) methods per each classiﬁer and ranked

them in the order from the highest to the lowest clas-

siﬁcation accuracies. Although this comparison is a

very simplistic model of comparison, we believe that

it is the easiest approach a researcher can employ

when dealing with algorithms that havedifferent char-

acteristics.

From the rankings obtained from Tables 1 and 2,

we can clearly observe the possibility of improving

the performance of DBCs by utilizing the DRs. In

most instances, the averaged classiﬁcation accuracies

of DR-based DBCs are increased compared to those

of PS-based ones (note that almost all of the highest

rankings are those of DR-based methods.) However,

some DR-based DBCs failed to improve their clas-

siﬁcation accuracies

. From this consideration, we

can see that it is difﬁcult for us to grade the meth-

ods as they are. Therefore, for simple comparisons,

we ﬁrst assigned marks of 3, 2, or 1 to all DR and

PS methods according to their ranks; 3 marks are

given to the 1

rank; 2 marks for the 2

, and 1

mark for the 3

. Then, we added up all the marks

that each method earned with the ﬁve classiﬁers and

the four measuring methods. For example, the marks

that LDA gained in ED, HD, RH, and WD rows

are 10(= 2 + 3 + 2 + 2 + 1), 8(= 3 + 2 + 1 + 2 + 0),

9(= 2+2+1+2+2),and 14(= 3+3+3+3+2),re-

spectively. Thus, the total mark that the LDA earned

is 41. Using the same system, we graded all the other

DR (and PS) methods, and, as a ﬁnal ranking, we ob-

tained the followings:

(1) For Yale, 1

: LDA (41); 2

: PCALDA (32);

: CLDA (17).

(2) For Nist38, 1

: PCA (51); 2

: ModeS (45);

: RandC (14).

Here, the number (·) of each DR (or PS) method

represents the ﬁnal grade it obtained. From this rank-

ing, we can see that all of the highest ranks are of DR

methods. Thus, in general, it should be mentioned

that more satisfactory optimization of DBCs can be

achieved by applying a DR after building the dissim-

ilarity space with all of the available samples rather

than by selecting the representative subset from them.

As analyzed in Section 2, choosing the entire

training set as the representative prototypes leads to

higher computational complexity as more distances

have to be calculated. In comparing PS-based and

DR-based methods, we simply measured the process-

ing CPU-times (seconds) of the DBCs designed with

the two databases. In the interest of space, the de-

tails of the measured times are omitted here. From the

measures, however, we can observe that the process-

ing CPU-times increased when DR-based methods

were applied. An instance of this change is the pro-

cessing times of ED for Yale. The processing times

of LDA, PCA, PCALDA, and CLDA methods are, re-

spectively, 0.0250, 0.2161, 0.2521, and 27.9266 (sec-

onds), while those of Rand, RandC, KCenter, and

ModeS are, respectively, 0.0182, 0.0099, 0.0870, and

For this failure, we are currently investigating why it

occurs and what the cause is.

ON REDUCING DIMENSIONALITY OF DISSIMILARITY MATRICES FOR OPTIMIZING DBC - An Experimental

Comparison

239

0.0141 (seconds). The same characteristic could also

be observed in HD, RD, and WD methods. In light

of brevity, the results of the others are omitted here

again. However, it is interesting to note that the pro-

cessing time of CLDA increases radically as the num-

ber of samples increases.

In review, the experimental results show that when

the DR-based method is applied to the dissimilarity

representation, the classiﬁcation accuracy of the re-

sultant DBCs increases, but so does the processing

CPU-time. In addition, in terms of the classiﬁcation

accuracies, the DR-based method is more useful for

the nonparametric classiﬁers, such as knnc and nmc,

but not for the parametric ones, such as ldc and qdc.

4 CONCLUSIONS

In this paper, we performed an empirical comparison

of PS-based and DR-based methods for optimizing

DBCs. DBCs designed with the two methods were

tested on the well-known benchmark databases, and

the classiﬁcation accuracies obtained were compared

with each other. Our experimental results demon-

strate that DR-based method is generally better than

PS-based methods in terms of classiﬁcation accuracy.

Especially, the DR-based method is more useful for

the nonparametric classiﬁers, but not for the paramet-

ric ones. Despite this success, problems remain to be

addressed. First, in this evaluation, we employed a

very simplistic model of comparison. Thus, develop-

ing a more scientiﬁc model, such as the one in (Sohn,

1999), is an avenue for future work. Next, the classiﬁ-

cation accuracy of DR-based DBCs increases, but so

does the processing CPU-time. To solve this problem,

therefore, developing a new dimensionality reduction

scheme in the dissimilarity space is required. Future

research will address these concerns.

REFERENCES

Adini, Y. Moses, Y. and Ullman, S. (1997). Face recog-

nition: The problem of compensating for changes in

illumination direction. IEEE Trans. Pattern Anal. and

Machine Intell., 19(7):721–732.

Belhumeour, P. N. Hespanha, J. P. and Kriegman, D. J.

(1997). Eigenfaces vs. ﬁsherfaces: Recognition using

class speciﬁc linear projection. IEEE Trans. Pattern

Anal. and Machine Intell., 19(7):711–720.

Bicego, M. Murino, V. and Figueiredo, M. A. T. (2004).

Smilarity-based classiﬁcation of sequences using hid-

den markov models. Pattern Recognition, 37:2281–

2291.

Bunke, H. and Riesen, K. (2007). A family of novel graph

kernels for structural pattern recognition. Lecture Note

in Computer Science, 4756:20–31.

Georghiades, A. S. Belhumeur, P. N. and Kriegman, D. J.

(2001). From few to many: Illumination cone mod-

els for face recognition under variable lighting and

pose. IEEE Trans. Pattern Anal. and Machine Intell.,

23(6):643–660.

Kim, S. W. (2006). Optimizing dissimilarity-based classi-

ﬁers using a newly modiﬁed hausdorff distance. Lec-

ture Note on Artiﬁcial Intelligence, 4303:177–186.

Kim, S. W. and Gao, J. (2008). On using dimensionality re-

duction schemes to optimize dissimilarity-based clas-

siﬁers. Lecture Note in Computer Science, 5197:309–

316.

Kim, S. W. and Oommen, B. J. (2007). On using pro-

totype reduction schemes to optimize dissimilarity-

based classiﬁcation. Pattern Recognition, 40:2946–

2957.

Laaksonen, J. and Oja, E. (1996). Subspace dimension

selection and averaged learning subspace method in

handwritten digit classiﬁcation. In Proceedings of

ICANN, pages 227–232, Bochum, Germany.

Loog, M. and Duin, R. P. W. (2004). Linear dimension-

ality reduction via a heteroscedastic extension of lda:

The cherno criterion. IEEE Trans. Pattern Anal. and

Machine Intell., 26(6):732–739.

Pekalska, E. and Duin, R. P. W. (2005). The Dissimilarity

Representation for Pattern Recognition: Foundations

and Applications. World Scientiﬁc Pub., Singapore.

Pekalska, E. Duin, R. P. W. and Paclik, P. (2006). Proto-

type selection for dissimilarity-based classiﬁers. Pat-

tern Recognition, 39:189–208.

Riesen, K. Kilchherr, V. and Bunke, H. (2007). Reduc-

ing the dimensionality of vector space embeddings

of graphs. Lecture Note on Artiﬁcial Intelligence,

4571:563–573.

Rueda, L. and Herrera, M. (2008). Linear dimensional-

ity reduction by maximizing the chernoff distance in

the transformed space. Pattern Recognition, 41:3138–

3152.

Sohn, S. Y. (1999). Meta analysis of classiﬁcation algo-

rithms for pattern. IEEE Trans. Pattern Anal. and Ma-

chine Intell., 21(11):1137–1144.

Wei, C., L. Y. C. W. and Li, C. (2009). Trademark im-

age retrieval using synthetic features for describing

global shape and interior structure. Pattern Recogni-

tion, 42:386–394.

Wilson, C. L. and Garris, M. D. (1992). Handprinted Char-

acter Database 3. National Institute of Standards and

Technology, Gaithersburg, Maryland.

Yu, H. and Yang, J. (2001). A direct lda algorithm for high-

dimensional data - with application to face recogni-

tion. Pattern Recognition, 34:2067–2070.

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

240