Passive-aggressive Online Learning for Relevance Feedback

in Content based Image Retrieval

Luca Piras

, Giorgio Giacinto

and Roberto Paredes

Department of Electrical and Electronic Engineering University of Cagliari, Piazza D’armi, 09123 Cagliari, Italy

Universidad Polit

ecnica de Valencia Camino de Vera s/n, Edif. 8G Acc. B, 46022 Valencia, Spain

Keywords:

Online Learning, Image Retrieval, Relevance Feedback.

Abstract:

The increasing availability of large archives of digital images has pushed the need for effective image retrieval

systems. Relevance Feedback (RF) techniques, where the user is involved in an iterative process to reﬁne the

search, have been recently formulated in terms of classiﬁcation paradigms in low-level feature spaces. Two

main issues arises in this formulation, namely the small size of the training set, and the unbalance between

the class of relevant images and all other non-relevant images. To address these issues, in this paper we

propose to formulate the RF paradigm in terms of Passive-Aggressive on-line learning approaches. These

approaches are particularly suited to be implemented in RF because of their iterative nature, which allows

further improvements in the image search process. The reported results show that the performances attained by

the proposed algorithm are comparable, and in many cases higher, than those attained by other RF approaches.

1 INTRODUCTION

Historically to incorporate pattern recognition into

image retrieval, we distinguish different cases. In

some cases, a Content Based Image Retrieval (CBIR)

system is applied for a special task, where the images

searched are from a particular domain and there is a

set of queries and relevant images known. In these

cases, it is possible to learn parameters for a sys-

tem to optimise retrieval performance. The other and

more generally applicable cases where pattern recog-

nition techniques can be used in CBIR involves rele-

vance feedback. The vast amount of digital pictures

produced, stored, and shared daily, demands effective

tools for ﬁnding similar images. Although CBIR sys-

tems have been under investigation since the 90’s (Nie

et al., 2012), still the most popular way of querying

image archives is through the use of textual tags or

captions associated to images. On the other hand, ac-

cessing images by textual information is not always

satisfactory for many users’ needs, because a large

number of concepts a user is interested in can be bet-

ter expressed through a query image rather than by a

text description. In the last years, begun to emerge

systems that retrieve images also through their con-

tent as, for example, Google Similar Images that com-

bines information from tags, comments, and user’s

clicks with content based features. Despite the good

results that can be achieved with this type of systems

the computational cost required for this type of im-

age indexing makes this approach difﬁcult to integrate

into less complex systems. For this reason, research

on systems completely content based and not so com-

putationally expensive is still very active.

The problem of ﬁnding images according to the

user’s requirements can be divided into a two-step

procedure. First, the user submits a query image con-

taining the concepts of interest. The system assigns to

each image in the database a relevance score related to

the similarity between the images and the query, ac-

cording to some suitable similarity measure. Then, a

number of best scored images are returned to the user

who labels them as being relevant or not according to

the concept in mind. These images are then used by

the system to adapt the search in order to provide the

user with new images.

One of the ﬁrst approaches proposed in the liter-

ature for ﬁnding relevant images according to user’s

feedback, is based on techniques developed in the

context of the Information Retrieval ﬁeld (Zhou and

Huang, 2003). These techniques are based on the con-

cept that there is a region in the feature space where

relevant images are clustered. Thus, one approach to

ﬁnd relevant images is to move the query towards the

area that should be more densely populated with rel-

evant images (Zhou and Huang, 2003). Differently

182

Piras L., Giacinto G. and Paredes R. (2013).

Passive-aggressive Online Learning for Relevance Feedback in Content based Image Retrieval.

In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods, pages 182-187

DOI: 10.5220/0004265401820187

 SciTePress

from the above methods that are essentially based

on density estimation, there is another family of rel-

evance feedback techniques based on discriminative

learning, i.e. methods that learn from a set of im-

ages labelled as being relevant or not. In this fam-

ily, a leading role is played by Support Vector Ma-

chines (SVM) (Cristianini and Shawe-Taylor, 2000).

The idea behind the SVM is to ﬁnd a hyperplane in the

feature space that divides it into two subspaces. The

ﬁrst populated by relevant samples, the second one by

non-relevant samples (Zhou and Huang, 2003).

Other approaches to exploit relevance feedback

are aimed at computing a distance metric such that

the distance in the low-level feature space is consis-

tent with the users’ relevance judgements. This met-

ric minimizes the distance between similar images,

and meanwhile maximizes the distance between the

feature vectors of dissimilar images (Deselaers et al.,

2008). While the above approaches are based on a

batch (off-line) formulation of the learning process,

the iterative nature of relevance feedback makes it

suited to on-line learning approaches. Moreover, as

the user’s intention could change along the RF pro-

cess, an on-line learning formulation allows the sys-

tem to adapt the model accordingly.

Online learning techniques address a number of

classiﬁcation problems, such as binary and multi-

class categorization, as well as regression and se-

quence prediction problems. Typically, on-line learn-

ing algorithms evaluate one pattern at time, and, for

each pattern, the algorithm predicts if it belongs or

not to the class of interest. Each pattern is associ-

ated with a unique label y

∈ {+1, −1} where {+1}

indicates that the pattern belongs to that class, while

{−1} indicates that the pattern does not belong to

that class. After each class assignment, the predicted

class is compared to the true class. According to

the outcome of the previous prediction the algorithm

adapts its prediction rule for the next evaluation in

order to improve the performances. Among the dif-

ferent online learning methods, we focus our atten-

tion to the Passive-Aggressive (PA) family of algo-

rithms (Crammer et al., 2006), as they are based on

the margin paradigm used by SVM. This method has

been successfully used for ranking images after a text

query (Grangier and Bengio, 2008), and for video tag-

ging applications (Paredes et al., 2009) among others.

In this paper, we propose the use of the PA paradigm

to exploit Relevance Feedback (RF) in Content Based

Image Retrieval (CBIR). We will show that the pro-

posed on-line approach allows attaining a very fast

and compact RF method, a property that is very im-

portant in a real-world CBIR scenario.

This paper is organized as follows. Section 2

presents the proposed technique based on a linear

model, while Section 3 present the kernel version. Ex-

perimental results reported in Section 4 shows that a

passive-aggressive approach provides good retrieval

performances when compared to approaches based on

classical classiﬁcation approaches. Conclusions are

drawn in Section 5.

2 ONLINE LEARNING FOR

RELEVANCE FEEDBACK IN

CBIR

Relevance Feedback is an interactive process where,

given an initial query, the user is presented with the

top-ranked images from the database, and is asked

to mark them as being relevant or not. Then, these

judgements are sent back to the retrieval system that

exploits user’s feedback to reﬁne the search function,

and provide the user with a new set of (hopefully) bet-

ter results. In each iteration of relevance feedback, the

system can use learning techniques to reﬁne the simi-

larity score measure, and thus to improve the results.

Online learning algorithms have been proposed

for classiﬁcation problems, where the goal, in the

binary case, is to assign a pattern to one of the

two classes. In image retrieval settings, the goal

is to rank images according to the similarity to the

query. Thus, online learning approaches need to be

adapted. The approach proposed in this paper is in-

spired by the work of (Grangier and Bengio, 2008),

that adapted Passive-Aggressive (PA) online classiﬁ-

cation approaches to retrieval scenarios. In the fol-

lowing, we will brieﬂy summarize the approach.

2.1 A Linear Model for Content based

Image Retrieval

In the case of on-line binary classiﬁcation problems,

one of the simplest classiﬁcation approach is the lin-

ear classiﬁer, where the classiﬁcation function is ac-

tually implemented by a vector of weights. Given

a vector w, and a pattern x, the sign of the product

(w ·x) can be used to classify the pattern as belong-

ing to one or the two classes, while the norm of the

product |w · x| is the degree of conﬁdence in the clas-

siﬁcation. We can modify the above deﬁnition of lin-

ear classiﬁer so that the result can be interpreted as a

relevance score assigned to image x:

score(x

) = w · x

, (1)

where w is the weight vector to be learnt. According

to (Grangier and Bengio, 2008), If the above men-

Passive-aggressiveOnlineLearningforRelevanceFeedbackinContentbasedImageRetrieval

183

tioned score is used to rank images, the desired behav-

ior of a CBIR system can be formulated as follows:

∀x

∈ R, ∀x

∈ N, w · x

> w ·x

, (2)

where R and N represent relevant and non relevant

images, respectively. In order to attain this behavior,

we can maximize the following expression:

∑

∀x

∈R

∑

∀x

∈N

w(x

− x

). (3)

Several methods can be used in order to compute w

that maximizes Eq. (3). Here we follow an idea sim-

ilar to the one proposed in (Grangier and Bengio,

2008) based on PA online learning (Crammer et al.,

2006) to compute the weight w by solving the follow-

ing equation:

= argmin

t+1

||w

t+1

− w

+ C l

, (4)

with



0 if w

− x

) > 1

1 −w

− x

) otherwise.

(5)

where t is the iteration and l

is the loss function at

iteration t. The ﬁrst term in Eq. (4),

||w

t+1

− w

plays the passive role in the algorithms as it forces

the values of the weights obtained at consecutive it-

erations to be close to each other, thus taking into ac-

count all the information learned in the past iterations.

Eq. (4) also shows that the second term, i.e., the loss

function that pays the role of the aggressive term, is

“smoothed” by the parameter C that avoids an exces-

sive ﬂuctuation of the weight between two iterations

in succession. According to (Crammer et al., 2006)

the solution of Eq. (4) is given by:

t+1

= w

+ Γ

− x

), (6)

where

= min



C ,

||x

− x



. (7)

2.2 Online Learning Algorithm for

Relevance Feedback

The PA learning framework proposed in the previous

section for image retrieval, can be used to implement

relevance feedback according to the following proce-

dure:

i) The user provides a query image to the system,

the system computes the similarity of the query

with all the images in the database, and images

are then sorted in order of decreasing similarity

with the query. The ﬁrst k images are presented to

the user who labels them as being relevant or not;

ii) henceforth, the on-line learning algorithm begins.

Each component of the initial weight vector w

initialized to 0;

iii) a pair of images x

and x

is randomly drawn from

the set of relevant images (R), and the set of non-

relevant images (N), respectively. The random

drawing is repeated for a ﬁxed number of rounds,

and at each round the weight vector is updated ac-

cordingly;

iv) using the updated weight vector, a new score for

each image of the database is computed according

to Eq. (1). The images are sorted according to the

scores, and the ﬁrst k are labelled by the user as in

step i);

v) a new relevance feedback iteration begins with a

new iterative update of the weight vector w ob-

tained from the previous iteration, until the user is

satisﬁed.

The choice of using a predeﬁned number of weight

updates in step iii) allows to handle different propor-

tions in relevant and non relevant images. If the di-

mension of the two sets of relevant and non relevant

images is large, then the overall number of possible

pairs is large, and the procedure would be compu-

tationally too expensive. On the other hand, if one

of the two sets is much smaller than the other (as

it typically happens in relevance feedback scenarios,

where non relevant images typically outnumbers rele-

vant images) the weight would be unbalanced toward

the bigger set. It turns out that, by using a ﬁxed num-

ber of rounds, images belonging to the smaller group

are drawn more than once.

Finally, the motivation behind the use of random

drawing is twofold. First of all, the user labels the im-

ages as relevant and non relevant to the query but does

not provide any ranking, so by random drawing pairs

of images, all the images can contribute to the evalua-

tion of the weight. Second, the ﬁnal result is different

if the same image is considered more than once in dif-

ferent rounds of the weight update procedure. In fact

the algorithm takes into account, thanks to the ﬁrst

term of Eq. (4), all the previous updates, and thus if

the same images are considered two or three times in

the same update, an improvement in the weight value

can be attained in the same way as considering two or

three different images.

3 KERNEL FORMULATION

In order to formulate the algorithm by resorting to

the so called kernel trick, it is useful to use a clas-

siﬁcation model instead of a ranking model. Accord-

ICPRAM2013-InternationalConferenceonPatternRecognitionApplicationsandMethods

184

ingly, each image x

is associated with a unique label

∈ {+1, −1} where {+1} indicates the relevant im-

ages and {−1} the non relevant ones. Moreover, the

goal should not be formulated as the maximization of

the distances between relevant and non relevant im-

ages, rather it should be formulated as the maximiza-

tion of the margin of the decision. In order to reﬂect

this change in perspective, we can modify the update

function (6) as follows:

t−1

∑

j=0

, (8)

and therefore

score

) = w

· x

t−1

∑

j=0

· x

). (9)

The inner product in the right hand side at Eq. (9)

can be replaced by a kernel expression K (x, x

) thus

obtaining

score

) =

t−1

∑

j=0

K (x

, x

), (10)

where according to (Crammer et al., 2006):

= min



C ,

||x



, (11)



0 if d

> 1

1 −d

otherwise

, (12)

t−1

∑

j=0

K (x

, x

). (13)

It is interesting to notice that according to this

approach, the vectors x

could be seen as sup-

port vectors, as in the Support Vector Machine

paradigm (Cristianini and Shawe-Taylor, 2000), be-

cause the scope is to maximize the margin between

relevant and non relevant images. In this approach,

only one image per round is randomly drawn from

the set of the previous retrieved images, and it can be

drawn more than once. t indicates the number of up-

date rounds executed so far, and obviously it is equal

to the number of images that have been evaluated until

that moment.

4 EXPERIMENTS AND RESULTS

This section contains the datasets description, the

evaluation protocol together with the results and com-

parison of the different algorithms.

4.1 Datasets

Experiments have been carried out using two datasets,

namely the Caltech-101, and the Caltech-256 dataset,

both from the California Institute of Technology

These datasets are widely used in object recognition,

and examples of the images in these datasets can be

found by visiting the website. The ﬁrst dataset con-

sists of 30607 images subdivided into 257 semantic

classes, the second one is composed by 9144 im-

ages subdivided into 102 semantic classes. Experi-

ments have been carried out by using different de-

scriptors, and results are shown for the Edge His-

togram descriptor (80 components) (MPE, 2003), and

the Color and Edge Directivity Descriptor (Cedd, 144

components) (Chatzichristoﬁs and Boutalis, 2008).

The open source library LIRE (Lucene Image RE-

trieval) has been used for feature extraction (Lux and

Chatzichristoﬁs, 2008).

4.2 Experimental Setup

In order to test the performances of the proposed ap-

proaches, 500 query images have been randomly ex-

tracted from each dataset, so that they cover all the

semantic classes. The top twenty best scored images

for each query are returned to the user. Relevance

feedback is performed by marking images belonging

to the same class of the query as relevant, and all other

images in the top twenty as non-relevant. Perfor-

mances are evaluated in terms of precision, and trun-

cated average precision (Nie et al., 2012). Precision is

evaluated by taking into account the top twenty best

scored images at each iteration, while the truncated

average precision takes into account the ranking of

the top T results, i.e., the average precision at depth

AP@T =

∑

i=1

rel (τ (i))

∑

j=1

rel (τ ( j))

(14)

where τ (i) is the image at the rank i, and rel (τ (i)) is

the associated binary relevance label equal to 1 if τ (i)

is relevant with respect to the query, and 0 otherwise,

the highest the value of AP@T, the better the ranking.

We performed nine rounds of relevance feedback, so

we ﬁxed T = min(|C

|, 180), where |C

| is the size of

the class of the query, and 180 is the maximum num-

ber of images the user may be asked to mark after all

the feedback rounds.

In order to choose the most suitable values of the

parameters discussed in Sections 2 and 3, a number of

preliminary experiments have been performed. Ac-

cordingly, we ﬁxed the number of update round t at

http://www.vision.caltech.edu/archive.html

Passive-aggressiveOnlineLearningforRelevanceFeedbackinContentbasedImageRetrieval

185

0 5 10

Caltech−101 Cedd

Iterations

Precision(%)

0 5 10

Caltech−101 Cedd

Iterations

Average Precision(%)

0 5 10

Caltech−101 Edge Histogram

Iterations

Precision(%)

SVM

Relevance Score

PA linear

PA Kernel

RBF

0 5 10

Caltech−101 Edge Histogram

Iterations

Average Precision(%)

Figure 1: Caltech 101 Dataset - Precision and Average Precision for 9 rounds of relevance feedback using Color and Edge

Directivity Descriptor (on the left) and Edge Histogram descriptor (on the right).

0 5 10

Caltech−256 Cedd

Iterations

Precision(%)

0 5 10

Caltech−256 Cedd

Iterations

Average Precision(%)

0 5 10

Caltech−256 Edge Histogram

Iterations

Precision(%)

SVM

Relevance Score

PA linear

PA Kernel

RBF

0 5 10

Caltech−256 Edge Histogram

Iterations

Average Precision(%)

Figure 2: Caltech 256 Dataset - Precision and Average Precision for 9 rounds of relevance feedback using Color and Edge

Directivity Descriptor (on the left) and Edge Histogram descriptor (on the right).

100, we used the RBF kernel in the kernel version of the PA algorithm with a value of σ

set to 0.1. We

ICPRAM2013-InternationalConferenceonPatternRecognitionApplicationsandMethods

186

have also tested different values for the parameter C

between 0.001 to 1000. The best results have been

obtained using C = 1. For comparison purposes, rel-

evance feedback has been also computed by a SVM

classiﬁer with an RBF kernel, and by the Relevance

Score (RS) method, where images are ranked accord-

ing to the ratio of the distances from the nearest rele-

vant and non-relevant images (Giacinto, 2007).

4.3 Results

Reported results show that the linear formulation of

the PA technique allows attaining the highest perfor-

mance in all the experiments at the end of the feed-

back rounds. In particular, the precision is quite close

to the one attained by RS, while it is greater than

the precision attained by the SVM classiﬁer after a

few iterations. On the other hand, the analysis of the

performances in terms of the AP@T, shows that the

linear PA approach allows attaining the highest per-

formances after four iterations, with a signiﬁcant gap

from the performance of the SVM approach. Thus, it

can concluded that the linear PA approach allows bet-

ter exploiting feedback from the user with respect to

traditional SVM classiﬁcation approaches when the

amount of available information increases. The lin-

ear PA approach also provides higher performances

in terms of AP@T than those attained by the RS ap-

proach, the difference being smaller than the one be-

tween the linear PA approach and the SVM approach.

This behavior can be explained by the similar ra-

tionale behind the PA and the RS approaches. Both

are aimed at producing a score that allows produc-

ing a better ranking of the images, while the SVM

approach is aimed at estimating a discriminating sur-

face, without taking into account the relative ranking.

By inspecting the performances attained by the

kernel formulation of the PA approach, it can be seen

that if just one iteration is allowed, it provides bet-

ter performances than those of the linear formulation,

and in some cases they are the highest. On the other

hand, it can be seen that after the second iteration they

are poorer than the ones attained by the linear for-

mulation. Thus, the kernel formulation of the PA ap-

proach does not provide the same power of the linear

formulation in exploiting the feedback information,

the reason being the strong relationship with the clas-

siﬁcation formulation of the problem that turns out

not to be the most suited approach for relevance feed-

back. If the performances of the kernel formulation of

the PA approach are compared to the ones provided by

SVM, it can be seen that they depend on the feature

space employed. In particular the kernel PA approach

outperforms SVM when the EH features are consid-

ered, while SVM is superior when the CEDD features

are used.

5 CONCLUSIONS

The PA approach can be used to exploit relevance

feedback in content based retrieval. In particular, the

linear formulation provides good performances, when

the user provides a signiﬁcant amount of feedback in-

formation. On the other hand, when few feedback

iterations are allowed, the performances are slightly

worse than the ones provided by other mechanisms.

Anyway, if the user wishes to provide more feedback,

the linear PA approach allows improving retrieval per-

formances faster than other mechanisms.

REFERENCES

(2003). Information technology - Multimedia content de-

scription interface - Part 3: Visual, ISO/IEC Std.

15938-3:2003.

Chatzichristoﬁs, S. A. and Boutalis, Y. S. (2008). Cedd:

Color and edge directivity descriptor: A compact de-

scriptor for image indexing and retrieval. In Lecture

Notes in Computer Science, v. 5008, pp. 312–322.

Springer.

Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S.,

and Singer, Y. (2006). Online passive-aggressive al-

gorithms. J. Mach. Learn. Res., 7:551–585.

Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction

to Support Vector Machines and Other Kernel-based

Learning Methods. Cambridge University Press.

Deselaers, T., Paredes, R., Vidal, E., and Ney, H. (2008).

Learning weighted distances for relevance feedback

in image retrieval. In ICPR, pages 1–4. IEEE.

Giacinto, G. (2007). A nearest-neighbor approach to rele-

vance feedback in content based image retrieval. In

CIVR ’07, pp. 456–463, New York, NY, USA. ACM.

Grangier, D. and Bengio, S. (2008). A discrimina-

tive kernel-based approach to rank images from text

queries. IEEE Trans. Pattern Anal. Mach. Intell.,

30(8):1371–1384.

Lux, M. and Chatzichristoﬁs, S. A. (2008). Lire: lucene

image retrieval: an extensible java cbir library. In MM

’08: Proc.g of the 16th ACM Int. Conf. on Multimedia,

pages 1085–1088, New York, NY, USA. ACM.

Nie, L., Wang, M., Zha, Z.-J., and Chua, T.-S. (2012). Ora-

cle in image search: A content-based approach to per-

formance prediction. ACM Trans. Inf. Syst., 30(2):13.

Paredes, R., Ulges, A., and Breuel, T. (2009). Fast discrimi-

native linear models for scalable video tagging. Mach.

Lear. and Applications, 4th Int. Conf. on, 0:571–576.

Zhou, X. S. and Huang, T. S. (2003). Relevance feedback in

image retrieval: A comprehensive review. Multimedia

Syst., 8(6):536–544.

Passive-aggressiveOnlineLearningforRelevanceFeedbackinContentbasedImageRetrieval

187