PERFORMANCE OF A COMPACT FEATURE VECTOR IN

CONTENT-BASED IMAGE RETRIEVAL

Gita Das

Clayton School of Information Technology, Monash University, Victoria 3800, Australia

Sid Ray

Clayton School of Information Technology, Monash University, Victoria 3800, Australia

Keywords:

CBIR, feature representation, sample size, dimensionality.

Abstract:

In this paper, we considered image retrieval as a dichotomous classiﬁcation problem and studied the effect of

sample size and dimensionality on the retrieval accuracy.

Finite sample size has always been a problem in Content-Based Image Retrieval (CBIR) system and it is more

severe when feature dimension is high. Here, we have discussed feature vectors having different dimensions

and their performance with real and synthetic data, with varying sample sizes. We reported experimental

results and analysis with two different image databases of size 1000, each with 10 semantic categories.

1 INTRODUCTION

Content-Based Image Retrieval (CBIR) where im-

ages are being represented by their visual features

has been an active research topic in recent years.

The application domain of CBIR encompasses a wide

range including medical, defence and security surveil-

lance systems. The selection of features e.g. colour,

shape, colour-layout etc. and their proper representa-

tion e.g. colour histogram, statistical moments etc.

are very important for good system retrieval. The

concept of co-occurrence matrix in texture has been

known for long (Haralick et al., 1973), however, its

use in colour has been reported very recently (Huang,

1998), (Shim and Choi, 2003), (Ojala et al., 2001).

A Colour Co-occurrence Matrix (CCM) represents

how the spatial correlation of colour changes with

distance i.e. pixel positions. In (Shim and Choi,

2003), a Modiﬁed Colour Co-occurrence Matrix is

used where a CCM of Hue is simpliﬁed to represent

the number of colour pairs between adjacent pixels (4-

neighbourhood). They did not consider the adverse

effect of ignoring Saturation and Value components

of colour. The diagonal elements in a CCM convey

the colour information of the entire image whereas

the non-diagonal elements represent the shape infor-

mation in an indirect way (Das and Ray, 2005), (Shim

and Choi, 2003). We described (Das and Ray, 2005) a

compact feature representation based on the elements

of CCMs in HSV (Hue, Value, Saturation) space. The

feature vector consists of all diagonal elements and

one representative value for all non-diagonal elements

of the CCM. Although the addition of features con-

tributes to better retrieval, it brings up the problem

of Curse of Dimensionality (Hughes, 1968), (Duda

et al., 2001). Hence, dimension reduction has be-

come a critical issue in feature representation and im-

age indexing of CBIR systems (Wu et al., 2000). In

(Das and Ray, 2005), we tried to reduce dimension

without compromising the retrieval accuracy. Experi-

mental results reveal that diagonal elements of CCMs

are much more in number (about 80%) compared to

the non-diagonal elements (about 20%). This is in

line with that reported in (Shim and Choi, 2003). As

the diagonal elements in the feature vector are the

majority, manipulating them in any way may con-

tribute to loss of information content of images sig-

niﬁcantly. Also, it is worth noting that most of the

non-diagonal elements are zero. Thus representing all

the non-diagonal elements with a single Sum-Average

(Haralick et al., 1973) value (for details, see section

3) attributes to several beneﬁts: i) the Sum-Average

of non-diagonal elements would be less sensitive to

noise and thus enhance retrieval performance, ii) the

dimension is reduced signiﬁcantly, thus reducing on-

line computation and retrieval time, iii) compared to

other methods of dimension reduction e.g. Principal

Component Analysis (PCA)(Wu et al., 2000), com-

241

Das G. and Ray S. (2007).

PERFORMANCE OF A COMPACT FEATURE VECTOR IN CONTENT-BASED IMAGE RETRIEVAL.

In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IU/MTSV, pages 241-246

 SciTePress

puting Sum-Average is very simple and easy.

Thus, for HSV=[16,3,3] the feature dimension is

148 in original dimension and 25 in reduced dimen-

sion. For rest of the paper, we refer the original fea-

ture space as 148-D and the compact one as 25-D.

With reduced dimension, we obtained improved per-

formance and faster retrieval.

In the past people have tried PCA (Principal Com-

ponent Analysis) (Sinha and Kangarloo, 2002), (Mar-

tinez and Kak, 2001), (Swets and Weng, 1996), a

useful statistical technique that ﬁnds the most signiﬁ-

cant features that describe a data set. PCA is suitable

for CBIR where we have basically a two-class (Rele-

vant and Non-relevant) classiﬁcation problem, and the

training sample size is usually small (Martinez and

Kak, 2001). To demonstrate the goodness of 25-D

feature vector we used the ﬁrst 25 eigen vectors (or

principal components) using PCA from original 148-

D and the feature vector thus derived will be called

PCA25-D.

The rest of the paper is organized as follows. Sec-

tion 2 gives an overview of our work. Section 3 pro-

vides a description of feature vectors, similarity mea-

sure and evaluation methodology. Section 4 details

our experimental setup and result analysis while sec-

tion 5 gives the conclusions and future work propos-

als.

2 OVERVIEW OF OUR WORK

In this paper, we studied mainly two issues:

1. Behaviour of feature vectors with real data and

synthetic data: In this paper we have discussed

three feature vectors with an emphasis on 25-D

and 148-D and their behaviour with real data and

synthetic data. In real domain, a number of pa-

rameters are involved and it is difﬁcult to iso-

late each one’s contribution to the ultimate re-

trieval accuracy. Whereas, with synthetic data

we can have more control over data distribution.

We explained that in 25-D compact feature vector

the correlation among the feature components are

much less compared to 148-D and the assumption

of feature independence is maintained.

2. Behaviour of feature vectors at varying relevant

class sizes: In reality, an image database com-

prises a number of semantic categories, each cat-

egory having a different number of samples. Pre-

cision (a performance evaluation parameter which

will be discussed in Section 3.3) for a query im-

age belonging to a category having more num-

ber of samples may be higher compared to the

one belonging to a category having less number

of samples. So, given a feature vector describ-

ing the images in a database, it is important to

know the relation of precision to the sample size

(i.e. the number of samples) of relevant category.

In (Huijsmans and Sebe, 2005), Huijsmans and

Sebe presented some results keeping sample size

of Relevant class constant while varying that of

Non-relevant class. They reported results on ac-

curacy based on one query category only. In our

study, we varied the sample size for each semantic

category at a time, measured precision for the cat-

egory and then averaged results of all categories to

obtain precision for the whole data set. This way

we get a detailed and more representative picture

of system performance.

3 METHODOLOGY

For rest of the paper, we used the following nomen-

clature:

N: Number of images in the database

C: Number of semantic categories in the database

Q,I: Query image and Database image respec-

tively

M: Number of components in the feature vector

i.e. feature dimension

L: Quantization levels in H,S,V matrices

: Scope i.e. the number of top retrieved images

returned to the user

3.1 Feature Representation and

Indexing

Let P be the L ×L co-occurrence matrix whose ele-

ment p

indicates the number of times a pixel with

colour level x occurs, at a distance d, relative to pixels

with colour level y. The Sum-Average as described in

(Haralick et al., 1973) has been modiﬁed in (Das and

Ray, 2005) and is as follows:

Sum ndiag =

L−1

∑

x=1

∑

y=x+1

(x + y)p

(1)

where Sum ndiag are Sum-Average of non-diagonal

elements of P. We chose HSV colour model as it

is known to be perceptually uniform. We tried to

make the spatial correlation more sensitive to Hue

and less sensitive to Value and Saturation. We experi-

mented with different levels of quantization and found

HSV=[16,3,3] to be a good choice. This ﬁnding is in

line with (Ojala et al., 2001). We chose co-occurrence

distance d=3 and used pixel pairs in both vertical and

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

242

horizontal directions. Thus we obtained symmetric

matrices and needed only upper diagonal elements to

consider. For a 16x16 matrix, the number of diagonal

elements is 16 and the the number of non-diagonal

elements is 120. For a 3x3 matrix, this number is 3

for both diagonal and non-diagonal elements. In our

method, we represented all non-diagonal elements by

a single value. Thus, for HSV=[16,3,3] the feature

dimension is 148 in original space and 25 in reduced

space.

As different feature components have different

range (or values), we normalized them so that they lie

within [0,1] and each component contributes equally

in the similarity measure.

In PCA, the ﬁrst principal component gives the

direction along which the variance of data is maxi-

mum, the second principal component is the direc-

tion of maximum variance of data which is orthog-

onal to the ﬁrst principal component, and so on. We

constructed PCAD25-D using the ﬁrst 25 components

(they contribute to almost 100% variation in data).

This also makes performance comparison with 25-D

feature vector on the same platform.

To ﬁnd the similarity between I and Q, we used

Minkowski distance measure, a commonly used met-

ric in CBIR,

D(I,Q) =

∑

i=1

− f

| (2)

where, f

is the i

normalized feature component.

This metric is computationally simple and produces

fairly good results.

3.2 Behaviour of Feature Vectors with

Real Data and Synthetic Data

In the similarity measure we assume that the fea-

tures are independent of each other. This, in real-

ity, can be a pretty strong assumption. So, even if

each individual feature has discriminative power, to-

gether they may not work as expected because of

inter-dependence.

We experimented with synthetic data to have more

control over data distribution. We used the mean and

standard deviation of each category from real data set

to randomly generate 100 points for each category.

Here, Gaussian distribution was used. To keep things

simple, we assumed the features to be uncorrelated.

The covariance of two statistically independent vari-

ables is always zero. However, the reverse is not al-

ways true. For the special case of Gaussian distribu-

tion, zero covariance does imply independence. Thus

we expect to have better result with synthetic data as

compared to real data. Let X be the dataset consisting

of N vectors, each being M-dimensional.

X = [x(1),x(2),...x(k),..x(N)]

, (3)

where,

x(k) = [x

(k),..x

(k)]

The covariance matrix obtained from the dataset X

gives a measure of how strongly its components are

related. The diagonal elements of the covariance

matrix indicates the variance of feature components

whereas the non-diagonal elements represent the co-

variance between the components. Let us denote R to

be the correlation matrix. Given any pair of compo-

nents, x

and x

, we denote their correlation as

i j

cov(x

)

(4)

where s

and s

are the standard deviations of x

and x

respectively.

By construction, a correlation is always a number

between −1 and 1. Correlation inherits the symme-

try property of covariance. To understand the feature

dependence better and to explain our results we have

introduced the following parameter α:

α =

∑

i=1

∑

j=i+1

i j

M(M −1)/2

(5)

In eqn (5), a high value of α indicates that the fea-

ture correlation is high.

To ﬁnd the statistical signiﬁcance of correlation

coefﬁcient we used t-test given by the following for-

mula (Spiegel, 1998):

t =

√

N −2

√

1 −r

(6)

where r is the correlation coefﬁcient between two

variables and N is the number of samples. The proba-

bility of the t-test indicates whether the observed cor-

relation coefﬁcient occurred by chance, if the true cor-

relation is zero. To state in another way, t-test is a

measure to ﬁnd whether the correlation between two

variables is signiﬁcantly different from zero.

In our case, we have multiple feature components

and in eqn (6), we have replaced r by α. This allows

us to test the statistical signiﬁcance of the average cor-

relation value. Note that this is only an approximation

of the t-test that is applicable to a pair-wise correlation

coefﬁcient test.

PERFORMANCE OF A COMPACT FEATURE VECTOR IN CONTENT-BASED IMAGE RETRIEVAL

243

3.3 Impact of Sample Size on Accuracy

To study the effect of sample size, we varied the rel-

evant class size (R) while keeping the non-relevant

class size (NR) constant. We used a random subset

of the original class size in order to avoid any bias in

choosing images. We used precision and recall (Das

and Ray, 2005), two widely used evaluation parame-

ters in CBIR ﬁeld, as a measure of system perfor-

mance. We calculated precision of a category by aver-

aging precision of all the images in the category used

as query image. The ﬁnal precision for any sample

size is obtained by averaging results from all seman-

tic categories in the database. This way we can have

the most appropriate representation of the system per-

formance.

4 EXPERIMENTAL STUDY

We experimented with two databases having the same

number of semantic categories. All images are of

256 ×256 pixels size. An image in the retrieved list

is considered to be relevant if that image comes from

the same category as the query image, otherwise, non-

relevant. While changing the R value we used a ran-

dom subset of the original set and averaged the preci-

sion over 3 random subsets. This way we can mini-

mize bias in precision, if any, due to selection of im-

ages. For all experiments we used a scope value of

20, a value that is not too high for user’s point of view

and not too low to have any relevant image retrieved.

4.1 Image Database and Ground Truth

1. DB1: This consists of 1000 images from 10 se-

mantic categories (Flower, Leaf, Face, Fish, Dam,

Car, Aeroplane, Leopard, Ship and Wristwatch).

Each category contains 100 images. We chose

500 images randomly for training and the rest 500

images for testing. For R=50, each category con-

tains 50 samples (or images). For R=40, 40 sam-

ples are taken randomly from 50 samples for each

category at a time whereas all other categories are

kept intact (i.e NR=450). Hence, for R=40 sample

size, the total number of images in the database is

490.

2. DB2: This consists of 1000 images from 10 se-

mantic categories (Africa, Beach, Dinosaurs, Ele-

phants, Roses, Horses, Mountains, Food and His-

torical buildings). Each category contains 100

samples. This WANG database is a subset of

Corel database and is freely available for re-

search at the Pensnsylvania State University web-

Table 1: Values for Correlation Matrix elements.

Avg of diag Avg of non-diag α

DB1 Real Data 148-D 1.000 0.229 0.229

25-D 1.000 0.158 0.158

Synthetic Data 148-D 1.000 0.077 0.077

25-D 1.000 0.081 0.081

DB2 Real Data 148-D 1.000 0.221 0.221

25-D 1.000 0.157 0.159

Synthetic Data 148-D 1.000 0.100 0.100

25-D 1.000 0.100 0.100

site http://wang.ist.psu.edu/. We picked up 500

images randomly for training and the rest 500 for

testing. We changed R = 50, 40, 30, 20, 10 while

NR = 450.

4.2 Results Analysis

4.2.1 Behaviour of Feature Vectors with Real

and Synthetic Data

Table 1 shows the co-variance values calculated for

both real and synthetic data, for original dataset con-

taining all 100 images per category, thus total num-

ber of images in the dataset is 1000. For real data,

the average of non-diagonal elements is more with

148-D compared to 25-D. This is true irrespective of

datasets. This is because 25-D has been constructed

form 148-D by combining features in an intelligent

way. The t-test for both 25-D and 148-D with real

data showed statistical signiﬁcance with 99% conﬁ-

dence. For synthetic data, average of non-diagonal

elements is more or less the same for both 148-D and

25-D. This is true for both datasets. The t-test shows

a conﬁdence level of 98%, though ideally the α value

for synthetic data should be very close to zero. The

reason behind this is the way we constructed the syn-

thetic dataset. The way we generated synthetic data

it is only assured that there is no feature correlation

within a semantic class, however, feature correlation

is possible over entire dataset.

Please note that for real data, α for PCA25-D is

zero for both data sets. This is because in the principal

components are orthogonal and hence, they have zero

correlation.

4.2.2 Behaviour of Feature Vectors at Varying

Relevant Class Sizes

In previous section we have shown performance of the

feature vectors for R = 50. However, as R changes,

precision will change. In this section we will demon-

strate the performance of the feature vectors for vary-

ing R. For all three vectors, precision with synthetic

data is better than with real data. This is true for all

values of R and for both datasets. However, for 25-D

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

244

and PCA25-D the curves for real and synthetic follow

the variation very closely.

From Figures 3(a) and (b), it is evident that with

real data 25-D performs better than PCA25-D and

148-D for all values of relevant class size. The infe-

rior performance of PCA25-D can be attributed to the

lack of consideration of the within-class and between-

class variation of data in PCA. As R increases from

10 to 20, precision with 25-D increases by 15.5%

whereas precision with 148-D increases by only 10%.

When R changes from 40 to 50, the improvement in

precision is 5.23% with 25-D but 4.37% with 148-D.

This shows the strength of our feature vector against

the variation in sample size and especially, at small

sample size. In context to CBIR, using a higher scope

means we looking for a larger neighbourhood. This

means precision will fall and recall will increase. In

Figures 4(a) and (b), for DB2, we shown the perfor-

mance of 25-D and 148-D from a different evalua-

tion angle consisting of both precision and recall. In

Figure 4(a), for R=50, at recall = 100%, precision is

18.7% for 25-D whereas it is 12.87% for 148-D. This

means a scope of 267 and 390 respectively. For R=10,

at recall 100%, 25-D shows a precision of 9.78% and

148-D shows 5.2%. This means a scope of 102 for 25-

D and 192 for 148-D. These values clearly indicate

that for lower value of sample size, 25-D performs

even better compared to 148-D.

5 CONCLUSIONS AND FUTURE

DIRECTIONS

1. The online computation with 25-D is much less

compared to 148-D. This reduction in number of

computations will be very signiﬁcant in today’s

real situation where image database size is already

very big and is increasing day by day.

2. Irrespective of data sets and feature dimension,

synthetic data always performs better than real

data. This is expected as in synthetic data we did

not consider any feature correlation.

3. For 25-D and PCA25-D, with real data set, the

variation of precision with R values follows that

of synthetic data pretty closely, unlike with 148-

D. This means feature re-weighting method where

we assume the features are independent of each

other is more suitable for 25-D compared to 148-

D. We introduced a new parameter, α, to explain

the feature correlation.

4. Irrespective of data set being real or synthetic, for

all feature vectors precision is more sensitive for

smaller R values compared to higher R values.

5. For both DB1 and DB2, with real data, for varying

relevant class size 25-D performs the best.

We ﬁnd that small sample issue is one of the major

bottlenecks in CBIR research. In the future, we

plan to investigate the small sample issue in more

details. Also, the experiments will be extended to

larger data sets.

REFERENCES

Das, G. and Ray, S. (2005). A compact feature representa-

tion and image indexing in content-based image re-

trieval. In Proceedings of Image and Vision Com-

puting New Zealand (IVCNZ 2005), pages 387–391,

Dunedin, New Zealand.

Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern

Classiﬁcation, 2nd ed.

Haralick, R. M., Shanmugam, K., and Dinstein, I. (1973).

Textural features for image classiﬁcation. IEEE

Transactions on Systems, Man, and Cybernetics,

SMC-3, No.6:610–621.

Huang, J. (1998). Color-spatial image indexing and appli-

cations. PhD Dissertation, Cornell University.

Hughes, G. F. (January 1968). On the mean accuracy of

statistical pattern recognizers. IEEE Transactions on

Information Theory, IT-14(1):55–63.

Huijsmans, D. P. and Sebe, N. (February 2005). How to

complete performance graphs in content-based image

retrieval: Add generality and normalize scope. IEEE

transactions on Pattern Analysis and Machine Intelli-

gence, 27(2).

Martinez, A. M. and Kak, A. C. (February 2001). PCA

versus LDA. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 23(2).

Ojala, T., Rautiainen, M., Matinmikko, E., and Aittola, M.

(2001). Semantic image retrieval with hsv correlo-

grams. In Proc. 12th Scandinavian Conference on Im-

age Analysis, pages 621–627, Bergen, Norway.

Shim, S. and Choi, T. (2003). Image indexing by modiﬁed

color co-occurrence matrix. In International Confer-

ence on Image Processing .

Sinha, U. and Kangarloo, H. (2002). Principal compo-

nent analysis for content-based image retrieval. Ra-

dioGraphics, (22 (5)):1271–1289.

Spiegel, M. R. (1998). Schaum’s outline series theory and

problems of statistics. McGraw-Hill, 2nd edition,

1998.

Swets, D. and Weng, J. (1996). Using discriminant

eigenfeatures for image retrieval. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence,

18(8):831–836.

Wu, P., Manjunath, B., and Shin, H. (2000). Dimensional-

ity reduction for image retrieval. In Proceeding IEEE

International Conference on Image Processing (ICIP

2000), pages 726–729, Vol. 3, Vancouver, Canada.

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

246