Multimedia Retrieval based on Geometric Distance in Semi-structured
Document
Sana Fakhfakh, Mohamed Tmar and Walid Mahdi
Laboratory MIRACL, Institute of Computer Science and Multimedia of Sfax, Sfax University, Sfax, Tunisia
Keywords:
Geometric Distance, Image, Multimedia Retrieval, XML Element, Structure.
Abstract:
This paper is included on multimedia retrieval in XML document, whose goal is to nd relevant multimedia
element. In this article, we are particularly interested by studing the impact of various structural factors for
image retrieval by our proposed method using a new evidence of source applied to media ”image”. This
method consist to define a geometric distance between XML nodes. Experiments are undertaken into two data
sets ”INEX 2007” and ”ImageCLEF 2010”. The obtained results showed effectiveness of our approach.
1 INTRODUCTION
This paper falls under the context of multimedia re-
trieval in XML documents. The need with this kind
of information is justified by quick change of scopes
of application which use structural documents (for-
mat HTML or XML) what imposes new challenges in
the field of search for information. Indeed, nowadays
XML document passed a simple tool for exchanging
data to a new storage medium. XML document in-
cludes textual element and multimedia element such
as image, audio and video. These elements are orga-
nized according to structure which includes informa-
tion notably although there is not only one manner to
organize contents. However, the choice of structure
depends greatly on the context of use of the textual
contents.
Mainly in the literature, there are two main classes
of approaches in the field of multimedia retrieval: re-
trieval methods based on multimedia content (MR-
content) and multimedia methods to retrieval based on
context (MR-Context). The approaches of the multi-
media retrieval based on content use specific features
of low level according to type of media (Lew, 2006).
We can cite for example image retrieval that exploits
visual features (the color, texture, forms ···). These
methods have proven effective with media ”image” in
well defined fields such as medical field this is due
to requirement for thorough knowledge of distinctive
media. This type of research can be applied to only
one type of media in system due to lack of semantic
representation in media content.
The approaches of the multimedia retrieval based
on context do not depend on type of media in question
(Elghazel et al., 2005) (Tjondronegoro et al., 2005).
Indeed, these methods rely on information surround-
ing the multimedia element representing its seman-
tic description. Multimedia retrieval based on tex-
tual context is most used, although the structural con-
text remains an obvious source which plays a part
paramount in understanding of structured documents.
In this article, we focus on techniques for multi-
media retrieval based on textual and structural con-
text in XML documents. This type of document in-
cludes textual information and structural constraints.
So, XML document cannot be effectively exploited
by classical techniques of IR, which regard document
as a plane source of information.
The implicit incorporation of multimedia ele-
ments in XML documents requires the exploitation of
textual context for multimedia retrieval. However, the
textual context remains insufficient in most of time.
The idea is to calculate the relevancy score of me-
dia element based on information from the textual
and structural context to answer a specific informa-
tion needs of user, expressed as query composed of
set of keywords.
Let us take for example a image media. If we ex-
ploit the image context which is composed by descrip-
tion of its contents such as its title, name, descriptive
texts which surround it, title of XML document ... In
following figure, we present document extracted from
”WIKIPEDIA” encyclopedia describing lion. We no-
tice the existing simultaneous textual and multime-
dia information. For image retrieval from time after
”Pleistocene”, we extract information from the tex-
220
Fakhfakh S., Tmar M. and Mahdi W..
Multimedia Retrieval based on Geometric Distance in Semi-structured Document.
DOI: 10.5220/0004964102200225
In Proceedings of the 10th International Conference on Web Information Systems and Technologies (WEBIST-2014), pages 220-225
ISBN: 978-989-758-023-9
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)
XML Document
Male
Female (lioness)
The lion (Panthera leo) is one of
the four big cats in the genus
Panthera and a member of the
family Felidae. With some males
exceeding 250 kg (550 lb) in
weight,[4] it is the second-largest
living cat after the tiger.....
Pleistocene
Lion
Cave lions in the
Chauvet Cave,
France
Several additional subspecies of
lion existed in prehistoric times:...
.
.
.
Figure 1: Example of a multimedia object context.
tual description, not from title (figure 1).
In our work, we will be interested by media ”im-
age”. Most existing work in this area uses the in-
formation from textual description of image. There
are other sources of evidence that were used as vi-
sual descriptors, information from link around the
image (Hliaoutakis et al., 2006), structure of XML
document. To resolve difficulties in mutlimedia re-
trieval field, you must define adequate source of ev-
idence for representation a multimedia element and
defining appropriate indexing model. In this context,
we present our structural indexing system combin-
ing conceptual information for semi-structured doc-
uments dedicated to approximate retrieval data. We
begin with an overview of existing work in multi-
media retrieval. Then we turn to the presentation of
our approach while detailing the preprocessing, ex-
traction of textual and structural and phase calcula-
tion relevance of multimedia element in information
a better response to needs expressed by user. Finally,
we present the results of applying our method on two
data sets ”INEX 2007” and ”ImageCLEF 2010”.
2 STATE OF THE ART
The advent of structured documents has caused new
problems in information retrieval world, and more
specifically in multimedia elements retrieval. These
problems are strongly related to nature of these doc-
uments that provide the structure as a new source
of evidence. Thus, nowadays, XML documents in-
clude multimedia elements of different types (audio,
video and image)implicitly embedded in the textual
elements. These multimedia elements (such as phys-
ical objects) do not contain enough information to be
able to answer a given query. Therefore, the calcula-
tion of relevance score of multimedia element must be
linked to textual and structural information provided
by other nodes XML (Hliaoutakis et al., 2006).
Several works deal XML documentas a flat source
of information and ignore the structure of XML doc-
uments. In this context, (Schlieder and Holger, 2002)
say: ”Ignore the document structure is to ignore its
semantics”. Indeed, XML document is used to de-
scribe a set of data by a structure that provides a se-
mantic lexicon. Thus, it facilitates the presentation
of information in terms of interpretation and exploita-
tion. Replying to this need, new works appear in the
field of multimedia retrieval that takes in account the
structure as source of relevant information. Existing
work in structured retrieval of multimedia elements is
decomposed in two classes. The first class includes
some works which proceed to adopt some traditional
technical of retrieval information as language model.
In this context, the team CWI/UTwente performs a
step of filtering results to keep the fragments contain-
ing at least one multimedia element (Tsikrika et al.,
2008)(Westerveld et al., 2007). The second class in-
cludes the specific work to be structured multimedia
retrieval. This class uses the structure as a source of
evidence in the process of selection of multimedia el-
ements. As first step, (Kong and Lalmas, 2005) pro-
posed a method which combines structure of XML
document (XPath) with the use of links (XLink). This
method consist to divide XML documentinto regions.
Each region represent a area of ancestors of the multi-
media element. His score is calculated in function of
the scores of each region. This method exploits verti-
cal structure only. In a second time, (Torjmen et al.,
2010) have used the addition of horizontal structure
to the notion of hierarchy. (Torjmen et al., 2010) use
a method called ”CBA” (Children, Brothers, Ances-
tors), which takes into consideration the information
carried by the children , brothers and fathers nodes
for calculate the relevance of multimedia elements.
The authors propose an alternative method ”Ontolo-
gyLike” which is based on the identification of XML
document to ontology. To calculate the similarity be-
tween nodes the authors use similarity measures that
are mainly based on the number of edges to calcu-
late the distance between nodes. There are other
approaches to multimedia retrieval are based on ex-
ploitation of links in XML document (Awadi and Tor-
jmen, 2010). This work was improved by proposing
a hybrid approach that combines structure with using
of links who is consider as semantic links (Aouadi
et al., 2012). This method above to divide the docu-
MultimediaRetrievalbasedonGeometricDistanceinSemi-structuredDocument
221
ment into regions according the hierarchical structure
and the location of image in document. This factor
plays a role in the weighting of links for compute the
score of image.
In this paper, we propose a new metric for mul-
timedia retrieval in XML documents which involves
the use of geometric distances to calculate the rele-
vance of each node from the multimedia node. This
method consists of placing the nodes of XML doc-
ument in Euclidean space and define each node by
a vector of coordinates to calculate then the distance
between each pair of nodes. This distance will play a
beneficial role in to calculate the score of multimedia
element.
3 PROPOSED APPROACH
We focus on techniques for multimedia retrieval
based on textual and structural context in XML doc-
uments. XML documents cannot be effectively ex-
ploited by classical techniques of IR, which regard
document as a bog of words. Therefore, the calcula-
tion of relevance score of multimedia element must be
linked to textual and structural information provided
by other nodes XML (Hliaoutakis et al., 2006). Thus,
it facilitates the presentation of information in terms
of interpretation and exploitation. Replying to this
need, we propose a new method in the field of mul-
timedia retrieval that takes into account the structure
as a source of evidence and its impact on search per-
formance. We present a new source of evidence ded-
icated to multimedia retrieval based on the intuition
that each textual node contains information that de-
scribes semantically a multimedia element. And the
participation of each text node in the score of a mul-
timedia element varies with its position in there XML
document. To compute the geometric distance, we
initially place the nodes of each XML document in an
Euclidean space to calculate the coordinates of each
node by algorithm 1. Then, we compute the score of
a multimedia element depending on the distance be-
tween each textual node (Fakhfakh et al., 2013).
<image id="248236" file="images/25/248236.jpg">
<name>Coronation of Louis VIII and Blanche...</name>
<text xml:lang="en">
<description>Coronation of Louis VIII ...</description>
<comment>Coronation of Louis VIII ...</comment>
<caption article="text/en/1/308531">a miniature from ...</caption>
<caption article="text/en/2/314411"> ... circa 1450.</caption>
</text>
<text xml:lang="fr">
<description>Couronnement de Louis VIII le Lion fol...</description>
<comment>Couronnement de [[Louis VIII of Fra)</comment>
<caption article="text/fr/1/501190">Couronnement de Louis VIII ...</caption>
<caption article="text/fr/5/540615">Couronnement de Louis VIII ...</caption>
</text>
<comment>({{en|Coronation of Louis VIII and Blanche...</comment>
<license>Public Domain</license>
</image>
image
&1
&2 &4
&3
&6
&5
name
text
text
comment
license
&7
&8
&9
&14&13&12&11
&10
description
comment
caption
caption
caption
caption
comment
description
X
Y
Z
&1
&2
&3
&8
&4
&6
&5
&7
&14
&10
&12
&9
&11
&13
Geometric representation of XML document XML treeXML document
Figure 2: The steps of passing an XML document to geo-
metric representation.
Figure 2 shows the steps of passing an XML doc-
ument to a geometric representation of the XML ele-
ments in a Euclidean space. The first step consist to
present a XML document as XML tree to take into
account XML document properties.
An XML tree is described by a set of relationships
between nodes. Formally an XML tree is a pair A =
(E, R) where E is a set of XML elements and R E
2
,
((p, q) R if p is the parent of q) is a set of relations
satisfying:
!r E, q E {r}, (r, q) R (1)
With r is the root of the tree.
p E {r}, !q E, (p, q) R (2)
Each node has a parent except the root r.
In second step, we will spend to presentation of
XML tree in a geometric representation. This step
is mainly based on equalities extraction in XML tree
according to our proposed hypotheses.
The XML tree representation allowed us to unveil
certain relationships of neighboring, brotherhood and
offspring. Indeed, the distance d which separate two
or more brothers with their common ancestors itera-
tively is the same. And brothers of the same hierar-
chical level are equidistant.
These distances are defined according to the re-
lationship of contiguity and semantic similarity be-
tween nodes. These distances are not quantized but
will be extracted in function of the position of each
textual node in XML tree.
All these properties result in: For all q
i
=
(x
i1
, x
i2
···x
im
) and q
j
= (x
j1
, x
j2
···x
jm
) where Q is
a set of vectors in R
m
.
In the same hierarchy, if there are more than two
brothers then their adjacent nodes are equidistant:
property 1
q
i
, q
j
, q
k
Q, if A
1
(q
i
) = A
1
(q
j
) = A
1
(q
k
)
d(q
i
, q
j
) = d(q
i
, q
k
)
The distance between any node and its descen-
dants is the same:
property 2
q
i
, q
j
, q Q, n N, A
n
(q
i
) = A
n
(q
j
) = q
d(q
i
, q) = d(q
j
, q)
With n N
, we define function A
n
by:
q E,
A
n
(q) =
{q} if n = 0
A
n1
(p) if p E, (p, q) R and n > 0
else
WEBIST2014-InternationalConferenceonWebInformationSystemsandTechnologies
222
From these relationships, we can generate system
of equations taking into account for kinship relation-
ships nodes based on hierarchy and adjacency. These
relationships are decried by equalities in this order
(these equations are only examples):
d(n
1
, n
2
) = d(n
1
, n
3
)
d(n
1
, n
2
) = d(n
1
, n
4
)
d(n
1
, n
7
) = d(n
1
, n
8
)
d(n
1
, n
7
) = d(n
1
, n
9
)
These distances are defined according to the re-
lationship of contiguity and semantic similarity be-
tween nodes. They are not quantized but will be ex-
tracted in function of the position of each textual node
in the XML tree. The resulting system is nonlinear,its
resolution requires the use of an approximate resolu-
tion iteratively method where we used iterative solu-
tion method (see Algorithm 1).
Algorithm 1: Resolution algorithm approximate nonlinear
system of equations.
Require: (Q = (q
1
, q
2
...q
|Q|
), R) :an XML tree as q
i
=(q
i1
,q
i2
...q
im
) i [1, |Q|]
m:dimension
for (i, j) [1, |Q|]
2
do
q
ij
random value
end for
Q
1
(q
1
, q
2
...q
|Q|
)
repeat
P Q
1
for (i, j) [1, |Q|]
2
do
Q
2
(q
1
, q
2
...q
i1
, q
i
+ d
j
(1), q
i+1
·· ·q
|Q|
)
Q
3
(q
1
, q
2
...q
i1
, q
i
+ d
j
(ε), q
i+1
·· ·q
|Q|
)
Q
4
(q
1
, q
2
...q
i1
, q
i
+ d
j
(1 ε), q
i+1
·· ·q
|Q|
)
t 0
while error(Q
1
) > error(Q
2
) > error(Q
3
) > error(Q
4
) do
Q
4
= (q
1
, q
2
...q
i1
, q
i
+ 2
t
d
j
(1), q
i+1
·· ·q
|Q|
)
t=t+1
end while
t 0
while error(Q
1
) < error(Q
2
) < error(Q
3
) < error(Q
4
) do
Q
1
= (q
1
, q
2
...q
i1
, q
i
2
t
d
j
(1), q
i+1
·· ·q
|Q|
)
t=t+1
end while
while |error(Q
1
) error(Q
2
)| > ε do
Q
5
Q
1
+ Q
2
2
let Q
5
= (p
1
, p
2
... p
|Q|
)
if error(p
1
, p
2
... p
i1
, p
i
d
j
(ε), p
i+1
·· · p
|Q|
) >
error(p
1
, p
2
... p
i1
, p
i
+ d
j
(ε), p
i+1
·· · p
|Q|
) then
Q
1
Q
5
else
Q
2
Q
5
end if
end while
end for
until P = Q
1
The process begins by assigning to each XML
node a random vector it. Tries to improve the coor-
dinate values of each node according to an error value
(the sum of the squared deviations). At each iteration,
the coordinates are improved together with the mini-
mization of this error. The algorithm stops when the
error reaches its minimum value (no improvement is
possible). Let Q the set of vectors obtained at a given
iteration during the running of the algorithm, the error
is defined by:
error(Q) =
q
i
,q
j
,q
k
Q,
A
1
(q
i
)=A
1
(q
j
)=A
1
(q
k
)
(d(q
i
, q
j
) d(q
i
, q
k
))
2
+
q
i
,q
j
,qQ,nN,
A
n
(q
i
)=A
n
(q
j
)=q
(d(q
i
, q) d(q
j
, q))
2
Where m is the dimension of the Euclidean space
and v R, D
j
(v) = (d
1
, d
2
···d
m
) is such as:
d
k
= {
0 if k 6= j
v otherwise
3.1 Indexing System
We propose a indexing system MXS index com-
posed by two party: party of textual indexing and
party of structural indexing. In first party, our
approach uses NLP (Natural Language Processing)
techniques to extract the candidate XML nodes of
the resulting indexing. The weight of these nodes is
depending on the frequency of each of these terms
and the number of elements in the corpus according
to the number of elements containing the term. In
Second party, we built structural index using infor-
mation extract from XML tree and geometric metric.
Each XML node will presented by characteristic vec-
tor. We start by extract geometric proprieties. And
we compute coordinates of each XML nodes. This
party is accompanied by generating XML data model
which processes ancestor, descendant and proximity
relationships (figure 3).
Figure 4 schematize the process of textual and
structural indexing XML documents with our index-
ing system. Well as the transition of XML document
as a tree presentation to geometric presentation in Eu-
clidean space.
3.2 Multimedia Element
A multimedia element (eg image) does not contain
textual content. Its score is based on textual nodes in
its neighborhood. The transition from the XML tree
structure representation of elements in an Euclidean
space, where we exploit the dissimilarity distances
separating a multimedia node and other textual nodes,
is performed by extracting the equations satisfying the
properties defined earlier and the application of algo-
rithm 1. To calculate the distance between a node
n and multimedia element H, we calculate the Eu-
clidean distance between their respective feature vec-
MultimediaRetrievalbasedonGeometricDistanceinSemi-structuredDocument
223
Textual Indexing
Word splitting
Stop words
removing
Lemmatisation
Tokenization
Structural Indexing
Extracting geometrical properties
Computing coordinates of XML node
XML
data model
XML
DataBase
NPL
Tools
Text Index Structural Index
Text Index
Structural Index
XML Node
Term weighting
General Index
Coordinates of XML node
Figure 3: Architecture of our indexing model MXSindex.
<image id="248236" file="images/25/248236.jpg">
<name>Coronation of Louis VIII and Blanche...</name>
<text xml:lang="en">
<description>Coronation of Louis VIII ...</description>
<comment>Coronation of Louis VIII ...</comment>
<caption article="text/en/1/308531">a miniature from ...</caption>
<caption article="text/en/2/314411"> ... circa 1450.</caption>
</text>
<text xml:lang="fr">
<description>Couronnement de Louis VIII le Lion fol...</description>
<comment>Couronnement de [[Louis VIII of Fra)</comment>
<caption article="text/fr/1/501190">Couronnement de Louis VIII ...</caption>
<caption article="text/fr/5/540615">Couronnement de Louis VIII ...</caption>
</text>
<comment>({{en|Coronation of Louis VIII and Blanche...</comment>
<license>Public Domain</license>
</image>
image
&1
&2 &4
&3
&6
&5
name
text
text
comment
license
&7
&8
&9
&14&13&12&11
&10
description
comment
caption
caption
caption
caption
comment
description
X
Y
Z
&1
&2
&3
&8
&4
&6
&5
&7
&14
&10
&12
&9
&11
&13
Geometric representation of XML document
text
comment
caption
caption
license
comment
description
description
comment
caption
caption
id parent tag
&1
&2
&9
&8
&7
&6
&5
&4
&3
&11
&10
&14
&13
&12
imageNULL
&1
&1
&1
&1
&1
&3
&3
&3
&3
&4
&4
&4
&4
name
text
Structural indexation of XML documentTextual indexation of XML document
term
frequence
in element
frequence
in documen
t
id
Coronation
Louis
&1
&1
2 6
3
9
.
.
.
.
.
.
.
.
.
.
.
.
Inverse index
Figure 4: Treatment process of XML document.
tors q
n
and q
H
:
dist(n, H) =
s
m
i=1
(q
n
q
H
)
2
(3)
With m is the dimension of the Euclidean space. q
n
is defined by: q
n
=(xn
i1
, xn
i2
... xn
im
) with xn are the
vector characteristics of node n. And q
H
is defined by:
q
H
=(xH
i1
, xH
i2
... xH
im
with xH represent the coordi-
nates compose the vector characteristics of a node H.
We calculate the score for each textual node depend-
ing on the frequencyof each term (t f) and the number
of elements in the corpus according to the number of
elements containing the term (id f ). A textual node
is presented by: n = (n
1
, n
2
···n
|v|
) where n
i
is the
weight of the term t
i
, v is the set of indexing terms:
n
i
= t f(t
i
, n) × id f(t
i
) (4)
With
id f(t
i
) = log(
N
N
i
) (5)
Where N is the total number of XML elements in the
corpus, N
i
is the number of elements that contain the
term t
i
and t f(t
i
, n) is the frequency of the term t
i
in
node n. The score of textual node depends on the
weight of each indexing term. A query is made by
the list v = (v
1
, v
2
···v
|v|
) where v
i
{0, 1} (0:not ex-
ist, 1:exist) according membership t
i
at the query. The
score of textual node n for the query q is defined by:
rsv(q, n) = q × n
T
=
|V|
i=1
q
i
× n
i
(6)
Where µ is the set of textual elements. The score of
multimedia node H is defined by:
rsv(q, H) =
nµ
rsv(q, n)
dist(n, H)
(7)
With dist(n, H) is the distance between the feature
vectors corresponding to the nodes n and H. This
equation leads to assign the importance of contribu-
tion of all nodes in computing the score of multimedia
element that shows its beneficial impact in multime-
dia retrieval.
4 EVALUATION AND RESULTS
We evaluate our system into two databases extracted
from two collections : INEX 2007 (Initiative for the
Evaluation of XML Retrieval) Ad Hoc task (Fuhr
et al., 2007) and ImageCLEF 2010 Wikipedia image
retrieval task (Popescu et al., 2010). These databases
are composed by XML documents extracted from
Wikipedia. The evaluation results show that this
method provides a MAP which is equal to 0.2102
as MAP with using ”ImageCLEF 2010” collection.
The result has been improved significantly with the
”INEX 2007 collection to 0.3102 as MAP. This in-
crease is due to nature of ”INEX 2007” collection
who includes XML documents with heterogeneous
structure. So in ”INEX 2007” collection we find doc-
uments with high depth. This factor highlights struc-
tural information and amplifies effect textual informa-
tion based on computed distances . For against, our
system is more stable with ”ImageCLEF 2010 col-
lection, this is due to rapid convergence of results (al-
gorithm 1). With our measure, we have shown that
combined use of textual and structural context can
properly determine the relevance of multimedia ele-
ment, and the structure plays a primordial role in mul-
timedia retrieval.
Table 1: Results of the impact our approach on INEX 2007
and ImageCLEF 2010 based in MAP(Mean Average Preci-
sion).
Company INEX 2007 ImageCLEF 2010
Task Collection XML Ad Hoc Wikipedia Retrieval
Number of XML document 659,388 237,434
Number of image 246,730 237,434
Topics 19 70
MAP 0.3102 0.2572
WEBIST2014-InternationalConferenceonWebInformationSystemsandTechnologies
224
5 CONCLUSIONS
In this paper, we propose a novel approach for multi-
media retrieval in XML documents. This method con-
sist to calculate the score of element mulltimedia ac-
cording the textual context provided by nodes in prox-
imity and structural context from distance between
nodes and multimedia element. Experiments shows
the interest of our method on INEX 2007 and Image-
CLEF 2010 collections. In the future, we want to ex-
ploit another factor to calculate the relevance of mul-
timedia element such as the title of image, the weight-
ing of the links in XML document ... As well as an-
other source of evidence as visual descriptors and the
study parameters combination of using of structural,
textual and visual context.
REFERENCES
Aouadi, H., Torjmen-Khemakhem, M., and Jemaa, M. B.
(2012). Combination of document structure and links
for multimedia object retrieval. Journal of Informa-
tion Science, 38(5):442–458.
Awadi, H. and Torjmen, M. (2010). Exploitation des liens
pour la recherche d’images dans des documents xml.
Elghazel, H., Idrissi, K., Baskurt, A., and Ben Amar, C.
(2005). Approche textuelle pour la recherche d’image.
In 3rd International Conference on Sciences of Elec-
tronic, Technologies of Information and Telecommu-
nications SETIT 2005.
Fakhfakh, S., Tmar, M., and Mahdi, W. (2013). A new met-
ric for multimedia retrieval in structured documents.
In ICEIS (2), pages 240–247.
Fuhr, N., Kamps, J., Lalmas, M., Malik, S., and Trotman,
A. (2007). Overview of the inex 2007 ad hoc track. In
INEX, pages 1–23.
Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.
G. M., and Milios, E. (2006). Information retrieval
by semantic similarity. In Intern. Journal on Semantic
Web and Information Systems (IJSWIS).Special Issue
of Multimedia Semantics, pages 55–73.
Kong, Z. and Lalmas, M. (2005). Xml multimedia retrieval.
In SPIRE, pages 218–223.
Lew, M. S. (2006). Content-based multimedia information
retrieval: State of the art and challenges. ACM Trans.
Multimedia Comput. Commun. Appl, 2:1–19.
Popescu, A., Tsikrika, T., and Kludas, J. (2010). Overview
of the wikipedia retrieval task at imageclef 2010. In
CLEF (Notebook Papers/LABs/Workshops).
Schlieder, T. and Holger, M. (2002). Querying and ranking
xml documents. Journal of the American Society for
Information Science and Technology, 53:489–503.
Tjondronegoro, D., Zhang, J., Gu, J., Nguyen, A., and Geva,
S. (2005). Integrating text retrieval and image retrieval
in xml document searching. In INEX, pages 511–524.
Torjmen, M., Pinel-Sauvagnat, K., and Boughanem, M.
(2010). Using textual and structural context for
searching multimedia elements. IJBIDM, 5(4):323–
352.
Tsikrika, T., Serdyukov, P., Rode, H., Westerveld, T., Aly,
R., Hiemstra, D., and de, A. P. V. (2008). Structured
document retrieval, multimedia retrieval, and entity
ranking using pf/tijah. In 6th Initiative on the Eval-
uation of XML Retrieval, INEX 2007, volume 4862 of
Lecture Notes in Computer Science, pages 306–320,
London. Springer Verlag.
Westerveld, T., Rode, H., van, R. O., Hiemstra, D.,
Ramirez, G., Mihajlovic, V., and de, A. V. (2007).
Evaluating structured information retrieval and multi-
media retrieval using pf/tijah. In Fuhr, N., Lalmas, M.,
and Trotman, A., editors, Comparative Evaluation of
XML Information Retrieval Systems, volume 4518 of
Lecture Notes in Computer Science, pages 104–114,
Berlin, Germany. Springer Verlag.
MultimediaRetrievalbasedonGeometricDistanceinSemi-structuredDocument
225