Towards a Query Translation Disambiguation Approach using Possibility

Theory

Oussama Ben Khiroun

1,2

, Bilel Elayeb

1,3

and Narj

es Bellamine Ben Saoud

RIADI Research Laboratory, ENSI, Manouba University, 2010, Manouba, Tunisia

National Engineering School of Sousse, ENISO, Sousse University, 4023, Sousse, Tunisia

Emirates College of Technology, P.O. Box: 41009, Abu Dhabi, United Arab Emirates

Keywords:

Cross-Language Information Retrieval, Possibility Theory, Parallel Corpus, Co-occurrence Graph, Query

Translation Disambiguation, Query Expansion.

Abstract:

We propose in this paper a combined method for Cross-Language Information Retrieval (CLIR) using statisti-

cal and lexical resources. On the one hand, we extracted a bilingual French to English dictionary from aligned

texts of the Europarl collection. On the other hand, we built a co-occurrence graph structure and used the

BabelNet lexical network to process the disambiguation of translation candidates for ambiguous words. We

compared our new possibilistic approach with circuit-based one and studied the impact of query expansion by

adopting the pseudo-relevance feedback (PRF) technique. Our experiments are performed using the standard

CLEF-2003 collection. The results show the positive impact of PRF on the query translation process. Besides,

the possibilistic approach using the co-occurrence graph outperforms the overall circuit-based runs.

1 INTRODUCTION

Cross-Language Information Retrieval (CLIR) deals

with retrieving and ranking a set of documents writ-

ten in a language different from the language of the

user’s query. It is an active sub-domain of the Infor-

mation Retrieval (IR) which is centered on the search

for documents starting from a need for information by

the IR system user. Indeed, CLIR tries to overcome

the language barrier between user requests and doc-

uments (Nie, 2010). In fact, in real life, a user sub-

mitting a query in French could also be interested in

documents in English, German, Arabic, etc.

In order to solve the problem of linguistic hetero-

geneity, the intuitive solution consists in translating

the query and/or the documents before performing the

search. We distinguish three general approaches for

translation that can be used in the design of a CLIR

system (Zhou et al., 2012) depending on translating

the query to match the representation of the document

or translate the document to match the query or trans-

late both of the query and the document to a third lan-

guage called pivot.

The ﬁrst method is the most widespread in CLIR

researches since the length of the query is usually

short which makes its translation faster and easier.

However, the reduced length of the query may gener-

ate ambiguity effect due to a limited contextual infor-

mation for the translation phase. Therefore, the sec-

ond method of document translation retains the the-

oretical advantage of having more contextual infor-

mation to determine the correct translation. However,

given the volume of the documents, this translation

becomes rather slow. This will require translating the

documents into all possible languages.

The paper is organized as follows: We review in

Section 2 previous related works about cross-lingual

disambiguation. In Section 3, we present the model

architecture that we used to perform translations dis-

ambiguation task. Section 4 details our new proposed

possibilistic approach for query disambiguation. Ex-

perimental results and their discussion are provided in

Section 5. Finally, Section 6 concludes this paper by

evaluating our work and proposing some directions

for future research.

2 RELATED WORK

The main approaches for query translation could be

resumed in using a Machine Translation (MT) system

or using a bilingual token-to-token resource (such as

bilingual dictionaries) or relying on corpus analysis.

606

Khiroun, O., Elayeb, B. and Saoud, N.

Towards a Query Translation Disambiguation Approach using Possibility Theory.

DOI: 10.5220/0006654706060613

In Proceedings of the 10th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2018) - Volume 2, pages 606-613

ISBN: 978-989-758-275-2

Many barriers still challenge the development of

CLIR systems such as the coverage of dictionaries,

the unavailability of parallel corpora in some lan-

guages and common linguistic speciﬁcity like poly-

semy, agglutination and named entity recognition.

Parallel corpora are considered a common source

of knowledge to perform the disambiguation task in

multilingual context. By sharing hidden meaning

that can be useful for extracting linguistic knowledge,

these corpora are good resources not only for per-

forming Cross-Lingual Word Sense Disambiguation

(CLWSD), but also for Natural Language Processing

(NLP) tasks (Resnik, 2004).

2.1 Graph-based Approaches for

CLWSD

In general, many techniques have been addressed

for solving CLWSD. As observed by (Duque et al.,

2015), graph-based systems are one of the most suc-

cessful approaches in the systems that participated in

the 2010 and 2013 SemEval competitions. Some of

these algorithms, have been widely used in the lit-

erature (Mihalcea, 2005; Navigli and Lapata, 2010;

Agirre et al., 2014).

eronis, 2004) presented the HyperLex algo-

rithm which is a corpus-based approach by build-

ing a co-occurrence graph for all pairs of words co-

occurring in the context of the target word. This kind

of graph have the properties of small world graphs.

Hence, the graph possesses highly connected compo-

nents (or hubs) that identify the main word uses (or

senses) of the target word, and so can be used to per-

form WSD task.

Agirre et al. present in (Agirre et al., 2006) a

comparative study between the Hyperlex algorithm of

eronis with an adapted algorithm of PageRank (Brin

and Page, 1998) for WSD. Thus, they explored the

use of two graph algorithms for corpus-based dis-

ambiguation of nominal senses. The performance of

PageRank was nearly the same as that of HyperLex,

with the advantage of PageRank of using less opti-

mization parameters.

(Silberer and Ponzetto, 2010) was inspired by the

works of (V

eronis, 2004) and (Agirre et al., 2006).

In fact, they presented in their work a graph-based

system to perform CLWSD by using a co-occurrence

graph built from multilingual parallel corpora and

the application of previously developed graph algo-

rithms for monolingual WSD. Afterwards, the Mini-

mum Spanning Tree (MST) is extracted from the ﬁnal

graph to perform WSD.

(Duque et al., 2015) presented an approach which

comprises the automatic generation of bilingual dic-

tionaries and the construction of a co-occurrence

graph to select the most suitable translations from

the dictionary. The proposed algorithms are based

on (i) sub-graphs (or communities) containing clus-

ters of words with related meanings, (ii) distances

between nodes representing words, and (iii) the rel-

ative importance of each node in the whole graph.

Using the SemEval-2010 and SemEval-2013 datasets

to evaluate their system, they proved the validity of

the unsupervised graph-based technique, which uses

the whole document as a coherent piece of informa-

tion, while other works consider windows of a spe-

ciﬁc size for building the context and calculating the

co-occurrences.

2.2 Combining Lexical and Statistical

Resources for CLIR

Since there are a diversity of query translation tech-

niques, the idea of combining these techniques was

studied in recent works in order to examine if one

approach is complementary to an other (Nie, 2010;

Azarbonyad et al., 2013; Schamoni et al., 2014).

For example, (Herbert et al., 2011) introduced in

a CLIR model by using Wikipedia to map concepts in

one language to their equivalents in another language.

This mapping is ensured thanks to the redirection

and cross-language links in multilingual Wikipedia

versions. In this work, the authors showed that the

Wikipedia translations can improve the performance

of statistical machine translation based CLIR systems.

In fact, queries are translated with Google Trans-

late online service and extended with new transla-

tions. These translations are obtained by mapping

noun phrases in the query to concepts in the target

language using Wikipedia.

ure and Boschee, 2014) have introduced a new

method for building a single combination recipe for

each query. They formulated this idea as a set of

binary classiﬁcation problems. The results show

that trained classiﬁers can be used to produce query-

speciﬁc combination weights effectively.

(Kim et al., 2015) explored how combining lex-

ical and statistical translation resources can improve

CLIR. Indeed, they used both Wikipedia and a ma-

chine readable dictionary (MRD) as lexical transla-

tion knowledge. Moreover, they explored parallel cor-

pora to extract statistically the translation candidates.

Kim et al. have proved that using the three transla-

tion evidences together (ie. a MRD, a parallel cor-

pus and Wikipedia knowledge) can yield better results

from any one source alone. Kim et al. proposed an

approach to post-translation query expansion using a

random walk over the Wikipedia concept link graph.

Towards a Query Translation Disambiguation Approach using Possibility Theory

607

This approach yields further improvements over al-

ternative techniques when evaluated on the NTCIR-5

English–Korean test collection.

A previous work of Elayeb et al. (Elayeb et al.,

2017) try to adjust dictionary based query translation

approaches since these approaches suffer from trans-

lation ambiguity and a word-by-word query transla-

tion is not always accurate. In this work, the au-

thors proposed a probability-to-possibility transfor-

mation as a mean to introduce further tolerance in

query translation process. The reported experiments

on the CLEF-2003 test collection showed that the

performance of the probability-to-possibility transfor-

mation based approach is better than the probabilis-

tic one and some state-of-the-art CLIR tools. The

work of Elayeb et al. was extended in (Ben Romd-

hane et al., 2017) to a discriminative possibilistic

query translation disambiguation approach using both

a bilingual dictionary and the Europarl parallel cor-

pus. The main goal is to overcome some draw-

backs of the dictionary-based techniques. When eval-

uated with the CLEF-2003 test collection, the dis-

criminative possibilistic approach outperformed both

the probabilistic and the probability-to-possibility

transformation-based approaches, especially for short

queries.

3 MODEL ARCHITECTURE FOR

DISAMBIGUATING QUERY

TRANSLATIONS

In this section, we propose the model architecture

to design and implement a new approach for disam-

biguating query translations in CLIR. We present in

Figure 1 the different resources and steps of this task

as follows:

Starting from an initial query written in French

(which presents the source language), a set of transla-

tions candidates in English language is reached from a

speciﬁc built dictionary. The process of building this

dictionary is described in sub-section 3.1.

Afterwards, the disambiguation module processes

ambiguous words, which have more than one possible

translation candidate. In this step, two main resources

are used to choose only the most relevant translation

(see details in sub-section 3.2).

A pseudo relevance feedback is applied at the end

of the process by extracting the most signiﬁcant terms

from the top ﬁrst returned documents and the whole

process may be iterated.

Co-occurrence graph

resource from Europarl

English documents

Index of

documents

Dictionary of

translations

Expanded

query

Pseudo relevance

feedback

Documents

results

Resources for disambiguating

query translations

Disambiguated

translated query

Matching

User

Europarl

parallel corpus

Disambiguating query

translations

Find English

translations

Query in

French

PRF

A set of

possible

translations

Resources for extracting

words translations

BabelNet

Figure 1: Overview of the disambiguation process for query

translations.

3.1 Extracting Translation Candidates

To select the translation candidates for the query

terms, we build a bilingual dictionary from the align-

ment of texts in French with the corresponding En-

glish texts in Europarl collection. This collection in-

cludes parallel texts in more than 11 languages which

are extracted from the proceedings of the European

Parliament (Koehn, 2005). Europarl was designed

initially for the statistical machine translation (SMT).

Nevertheless, it is used in other applications such as

NLP and WSD.

Afterwards, to ensure the alignment of the paral-

lel texts in Europarl at the word level, we used the

GIZA++ statistical machine translation toolkit

. This

tool is a statistical aligner that is able to extract one-

to-many translations (Och and Ney, 2003).

The ﬁnal extracted couples of word in both source

and target languages are structured in CSV format.

This format may help in making the dictionary an

easy human readable resource. Nevertheless, the built

resource lacks of coverage. Actually, we exploited a

limited set of 717 French words enclosed in CLEF-

2003 standard collection’s queries (or topics). The ﬁ-

nal number of English translation candidates is 2324.

http://www.statmt.org/moses/giza/GIZA++.html

ICAART 2018 - 10th International Conference on Agents and Artiﬁcial Intelligence

608

Thus, extracting a limited subset of translations may

lead to an Out Of Vocabulary (OOV) problem in case

of using this resource for other more general purpose.

3.2 Disambiguating Query Translations

In order to proceed in disambiguation task of the pro-

posed translation candidates after extraction step, we

used two different types of resources: a statistical and

a lexical resource. On the one hand, we extracted

a co-occurrence graph from Europarl English docu-

ments. Each word is related with other words of the

same sentence, based on the assumption that if two

terms co-occur then they tend to be semantically re-

lated (Cao et al., 2005). Hence, we consider the sen-

tence as the context window.

On the other hand, we used BabelNet, which is

considered as a rich lexical resource by the integra-

tion of lexicographic and encyclopedic knowledge

from WordNet and Wikipedia (Navigli and Ponzetto,

2012). BabelNet is a multilingual knowledge re-

source, in that it provides a semantic network where

related concepts are connected within a graph struc-

ture. Given these distinguishing features, BabelNet is

a powerful resource for performing knowledge-based

lexical disambiguation in a multilingual setting. Ba-

belNet groups words into sets of synonyms called

synsets (this name is inspired from WordNet termi-

nology (Miller et al., 1990)). All words composing a

given Babel synset are semantically related.

4 A POSSIBILISTIC APPROACH

FOR DISAMBIGUATING

QUERY TRANSLATIONS

We based our approach on the possibilistic theory in-

troduced by Zadeh (Zadeh, 1978) and developed by

several authors (Dubois and Prade, 2011).

Consider we have a query written in source lan-

guage Q

(src)

= {T

(src)

,...,T

(src)

} where n rep-

resents the number of words in the query. Each term

(src)

in the query may have one to many translation

candidates in a chosen target language.

Let’s note by Φ(T

(src)

) = {T

(trg)

i j

, j ∈ [1..m]} the

set of the m possible translation candidates, for a term

(src)

, that are extracted from the built bilingual dic-

tionary (refer to sub-section 3.1).

We call vector of context, relative to a term T

(src)

the union of sets of translation candidates for terms

(src)

6= T

(src)

formalized as follows:

= {

[

Φ(T

(src)

), k 6= i and k ∈ [1..m]} (1)

We designate by semantic vector, relative to

a translation candidate T

(trg)

i j

, the set of extracted

terms from the co-occurrence graph or the assembled

synsets from BabelNet as described previously in sub-

section 3.2. Hence, we note the semantic vector as

follows:

V S

i j

=< s

(trg)

i j1

, s

(trg)

i j2

,..., s

(trg)

i jk

> (2)

The relevance of a semantic vector, presented by

V S

i j

, to the vector of context of the query, is deter-

mined by extending the possibilistic matching model

proposed in (Ben Khiroun et al., 2012) by using a

double measure of relevance as follows:

The possible relevance allows ignoring irrelevant

translations to a given query. The necessary relevance

reinforces the need to include relevant translation can-

didates in the ﬁnal translation of the query.

The possibility measure Π(V S

i j

|VC

) is propor-

tional to:

Π(V S

i j

|VC

) = Π(w

|V S

i j

) × ... × Π(w

|V S

i j

)

= n f t

1i j

× . .. × n f t

pi j

(3)

• With: n f t

ki j

= t f

ki j

/max(t f

ki j

) represents the nor-

malized frequency of the translation term w

∈

in the semantic vector V S

i j

relative to the

translation term candidate T

(trg)

i j

;

• And t f

ki j

occurrences number o f w

inV S

i j

number o f termsinV S

i j

The necessity to restore a relevant translation candi-

date T

(trg)

i j

for a context of translation terms, noted by

N(V S

i j

|VC

), is calculated as the following:

N(V S

i j

|VC

) = 1 − Π(¬V S

i j

|VC

) (4)

At the same way, Π(¬V S

i j

|VC

) is proportional

to:

Π(¬V S

i j

|VC

) = (1 − φ

(trg)

i j

)) × ... × (1 − φ

(trg)

i j

))

(5)

Where:

(trg)

i j

) = Log

(

nCT

) × n f t

ki j

(6)

• With: nCT

: Number of translation candidates for

the term T

(src)

of the initial query;

• And nT

: Number of translation candidates con-

taining the term w

∈ VC

We deﬁne the degree of possibilistic relevance (DPR)

of each translation candidate (T

(trg)

i j

) giving a context

of translation terms (VC

) by the following formula:

DPR(V S

i j

|VC

) = Π(V S

i j

|VC

) + N(V S

i j

|VC

) (7)

The preferred translations are those having a high

score of DPR.

Towards a Query Translation Disambiguation Approach using Possibility Theory

609

We resume in Algorithm 1 the different steps of

our proposed possibilistic approach.

Algorithm 1: The possibilistic algorithm for query

translation disambiguation.

input : Q

(src)

query in source language

output: Q

(trg)

translated query in target

language

1 foreach term T

(src)

∈ Q

(src)

2 build vector of context VC

3 foreach translation candidate

(trg)

i j

∈ Φ(T

(src)

) do

4 extract semantic vector V S

i j

from a

resource

5 compute possibilistic score of V S

i j

relation with VC

6 end

7 add best translation candidate to Q

(trg)

8 end

5 EXPERIMENTAL RESULTS

AND COMPARATIVE STUDY

In this section, we evaluate and compare the contribu-

tion of the possibilistic approach by using lexical and

statistical translation resources. We used the CLEF-

2003 as standard test collection. This collection pro-

vides necessary tools for the evaluation of information

retrieval systems for mono- and multilingual tasks. It

includes a set of documents, a set of queries and the

list of relevant documents for each query (Braschler

and Peters, 2004). The documents, that form CLEF-

2003, are written in 9 European languages (including

English and French) and are collected from the same

periods and have comparable contents.

To perform the experimentation, we used the Ter-

rier platform for information retrieval. All experi-

ments are carried out using the OKAPI BM25 weight-

ing model for matching between queries and docu-

ments (Ounis et al., 2007).

5.1 Evaluating the Query Translation

Approach

We compare in Figure 2 our proposed approach

for disambiguating query translations based on two

knowledge resources. The series, labeled with “Cooc-

currence” and “Babelnet”, refer respectively to the

co-occurrence built graph scenario and to synsets ex-

tracted from BabelNet.

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0 ,8 0, 9 1

Precision

Recall

Baseline

Babelnet

Cooccurrence

Figure 2: Recall-Precision curve comparing different runs.

The “baseline” series represent the precision

of the original English version of queries pro-

posed in CLEF-2003 test collection. We intro-

duced also the results of translations using the

Google Translate Machine Translation (MT) tool

(https://translate.google.com).

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1

Precision

Recall

Baseline+PRF

MT+PRF

Babelnet+PRF

Cooccurrence+PRF

Figure 3: Recall-Precision curve comparing different runs

by applying pseudo-relevance feedback (PRF).

The pseudo-relevance feedback (PRF) technique

exploits the top k most relevant retrieved documents

in order to expand the proposed query. Therefore, a

set of candidate terms from these documents is added

using often variants of Rocchio algorithm (Rocchio,

1971). Hence, we present, in Figure 3, the impact

of applying PRF with previous runs scenarios. We

used the Bo1 (Bose-Einstein 1) PRF algorithm imple-

mented in the Terrier IR platform by applying the de-

fault settings as follows: the number of terms to ex-

pand a query is set to 10 and the number of top-ranked

documents from which these terms are extracted is

limited to three documents.

Table 1 details precision values @5, @10,

@15. . . and @1000 top documents of all runs by ap-

plying the PRF query expansion. Results show that

using co-occurrence as a disambiguating resource by

applying the possibilistic approach outperform other

resources at ﬁrst top documents. However, the perfor-

mance of MT is better at last ranked documents.

To reﬁne our study about the co-occurrence

based approach in comparison with the BabelNet

ICAART 2018 - 10th International Conference on Agents and Artiﬁcial Intelligence

610

Table 1: Precision values for different runs by applying

pseudo relevance feedback.

Baseline MT BabelNet Co-occurr.

P@5 0,4148 0,3741 0,3741 0,3815

P@10 0,3333 0,3259 0,3037 0,3352

P@15 0,2975 0,2864 0,2691 0,3012

P@20 0,2741 0,2657 0,2398 0,2759

P@30 0,2309 0,2259 0,2049 0,2364

P@50 0,1785 0,1789 0,1596 0,1848

P@100 0,118 0,1185 0,1056 0,1174

P@200 0,0705 0,0694 0,065 0,0683

P@500 0,0334 0,0315 0,0304 0,0313

P@1000 0,0175 0,0164 0,0162 0,0163

and the machine translation approaches, we use the

Wilcoxon matched-pairs signed-ranks test as proposed

by (Dem

sar, 2006). The given values (p value) are

computed by comparing the precision values pairs of

the co-occurrence based resource approach to each

from the other machine translation and BabelNet

based approaches.

As given in Table 2, the p value results prove

that the improvement of the co-occurrence approach,

compared to both the MT (p value = 0.010301 < 0.05)

and to the BabelNet (p value = 0.003509 < 0.05), is

statistically signiﬁcant (Biau et al., 2010).

Table 2: The p value results for the Wilcoxon matched-pairs

signed-ranks test for precision values.

p value

Co-occurrence vs. MT 0.010301

Co-occurrence vs. BabelNet 0.003509

In order to have a comparison study of the pos-

sibilistic model, we conducted more detailed experi-

ments by using the circuit-based approach measure.

This approach was studied previously in monolingual

WSD by (Elayeb et al., 2015).

In our current work, we apply this model for dis-

ambiguating query translations terms by computing

semantic similarity of a given term t

and a translation

candidate t

according to the following formula:

sim(t

) =

#circuits(t

)

MAX(#circuitsin graph)

(8)

• Where: #circuits(t

): represents the number

of circuits starting from the node t

and passing

through the node t

in the graph (i.e. t

→ ... →

→ ... → t

);

• And: MAX(#circuitsin graph): represents the

maximum number of circuits in the graph.

Aiming to optimize the search of circuits in the graph

structure, we extracted a limited collection of Ba-

belNet’s synsets included in the translation candi-

dates. This subset covers only the dictionary of

translations entries that corresponds to CLEF-2003

queries. Besides, we considered the maximum length

of circuit taken into account about 4 edges as studied

in (Elayeb, 2009).

0,1

0,2

0,3

0,4

0,5

0,6

simple PRF simple PRF simple PRF simple PRF simple PRF simple PRF

Baseline MT Babelnet

(possibilistic)

Cooccurrence

(possibilistic)

Babelnet

(circuit-based)

Cooccurrence

(circuit-based)

MAP R-précision

Figure 4: MAP and R-precision results for possibilistic

and circuit-based approaches by applying pseudo relevance

feedback (scenario “PRF”) and without applying it (sce-

nario “simple”).

Figure 4 shows the values of the MAP and R-

precision common measures in the evaluation of IR

systems. The MAP value represents the mean aver-

age precision of the query topics and the R-precision

deﬁnes the precision at rank R; where R is the total

number of relevant documents (Baccini et al., 2012)

As a general ﬁrst observation, we notice that the

co-occurrence methods still outperform with little en-

hancement the BabelNet based runs. However, all

runs are under baseline and MT performance when

considering the MAP an R-precision metrics.

Results show an advance for possibilistic runs

when compared to the circuit-based approaches. In-

deed, computing the DPR score comprises two mea-

sures: the possible relevance allows rejecting irrel-

evant translations, whereas the necessary relevance

makes it possible to reinforce the translations not

eliminated by the possibility. The performance of

possibilistic models versus probabilistic ones was

also observed in other applications such as query

expansion (Elayeb et al., 2011) and monolingual

WSD (Elayeb et al., 2015).

6 CONCLUSION

This work presents and compares possibilistic and

circuit-based approaches using statistical and lexical

resources. The two resources are built by modeling

co-occurrence graph and extracting BabelNet synsets

relations to form graph data structures. Our proposed

approach aim to design a general process for CLWSD.

On the one hand, the proposed possibilistic ap-

proach outperformed the circuit-based one. On the

other hand, using co-occurrence graphs have resulted

Towards a Query Translation Disambiguation Approach using Possibility Theory

611

to slightly better performance compared to exploiting

extracted sub-networks from BabelNet. Furthermore,

applying pseudo-relevance feedback technique con-

tributed in the enhancement of different runs, which

joins previous works (Paskalis and Khodra, 2011;

Ben Khiroun et al., 2014; Elayeb et al., 2014).

As future perspectives of the current work, we

propose to resolve the out of vocabulary problem due

to the nature of bilingual dictionary extraction pro-

cess that is proposed in this paper. In fact, knowl-

edge based query translation approaches that rely on

aligned corpora are dependent to the size and the type

of analyzed texts. This could be a great challenge

face to the lack of parallel resources for some lan-

guages like Arabic as presented in (Elayeb and Boun-

has, 2016). Another potentially interesting direction

for future work would be to study the impact of ap-

plying query expansion before and after the transla-

tion process (known also by pre- and post-translation

query expansion). Moreover, we can study the con-

tribution of query expansion techniques, other than

pseudo-relevance feedback, such as knowledge based

ones that rely on machine readable dictionaries or by

exploiting ontological semantic relations for example.

ACKNOWLEDGEMENTS

We are grateful to the Evaluations and Language re-

sources Distribution Agency (ELDA) which kindly

provided us the CLEF-2003 collection.

REFERENCES

Agirre, E., L

opez de Lacalle, O., and Soroa, A. (2014). Ran-

dom Walks for Knowledge-based Word Sense Disam-

biguation. Comput. Linguist., 40(1):57–84.

Agirre, E., Mart

ınez, D., de Lacalle, O. L., and Soroa, A.

(2006). Two Graph-based Algorithms for State-of-

the-art WSD. In Proceedings of the 2006 Conference

on Empirical Methods in Natural Language Process-

ing, EMNLP ’06, pages 585–593, Stroudsburg, PA,

USA. Association for Computational Linguistics.

Azarbonyad, H., Shakery, A., and Faili, H. (2013). Ex-

ploiting Multiple Translation Resources for English-

Persian Cross Language Information Retrieval. In In-

formation Access Evaluation. Multilinguality, Multi-

modality, and Visualization, Lecture Notes in Com-

puter Science, pages 93–99. Springer, Berlin, Heidel-

berg.

Baccini, A., D

ejean, S., Lafage, L., and Mothe, J. (2012).

How many performance measures to evaluate infor-

mation retrieval systems? Knowledge and Informa-

tion Systems, 30(3):693–713.

Ben Khiroun, O., Elayeb, B., Bounhas, I., Evrard, F., and

Bellamine-BenSaoud, N. (2012). A Possibilistic Ap-

proach for Automatic Word Sense Disambiguation. In

Proceedings of the 24th Conference on Computational

Linguistics and Speech Processing (ROCLING), pages

261–275, Taiwan.

Ben Khiroun, O., Elayeb, B., Bounhas, I., Evrard, F., and

Bellamine-BenSaoud, N. (2014). Improving query

expansion by automatic query disambiguation in in-

telligent information retrieval. In The 6th Interna-

tional Conference on Agents and Artiﬁcial Intelli-

gence (ICAART 2014), pages 153–160, Angers, Loire

Valley, France.

Ben Romdhane, W., Elayeb, B., and Bellamine Ben Saoud,

N. (2017). A Discriminative Possibilistic Approach

for Query Translation Disambiguation. In Natu-

ral Language Processing and Information Systems,

Lecture Notes in Computer Science, pages 366–379.

Springer, Cham.

Biau, D. J., Jolles, B. M., and Porcher, R. (2010). P value

and the theory of hypothesis testing: an explanation

for new researchers. Clinical Orthopaedics and Re-

lated Research, 468(3):885–892.

Braschler, M. and Peters, C. (2004). CLEF 2003 Methodol-

ogy and Metrics. In Peters, C., Gonzalo, J., Braschler,

M., and Kluck, M., editors, Comparative Evaluation

of Multilingual Information Access Systems, number

3237 in Lecture Notes in Computer Science, pages 7–

20. Springer Berlin Heidelberg.

Brin, S. and Page, L. (1998). The Anatomy of a Large-

scale Hypertextual Web Search Engine. In Proceed-

ings of the Seventh International Conference on World

Wide Web 7, WWW7, pages 107–117, Amsterdam,

The Netherlands, The Netherlands. Elsevier Science

Publishers B. V.

Cao, G., Nie, J.-Y., and Bai, J. (2005). Integrating Word

Relationships into Language Models. In Proceedings

of the 28th Annual International ACM SIGIR Con-

ference on Research and Development in Information

Retrieval, SIGIR ’05, pages 298–305, New York, NY,

USA. ACM.

Dem

sar, J. (2006). Statistical Comparisons of Classiﬁers

over Multiple Data Sets. J. Mach. Learn. Res., 7:1–

30.

Dubois, D. and Prade, H. (2011). Possibility theory and its

application: Where do we stand. Mathware and Soft

Computing, 18(1):18–31.

Duque, A., Araujo, L., and Martinez-Romo, J. (2015). CO-

graph: A new graph-based technique for cross-lingual

word sense disambiguation. Natural Language Engi-

neering, 21(5):743–772.

Elayeb, B. (2009). SARIPOD: Syst

eme multi-Agent de

Recherche Intelligente POssibiliste des Documents

Web. PhD thesis, Institut National Polytechnique de

Toulouse, France & Ecole Nationale des Sciences de

l’Informatique, Universit

e de la Manouba, Tunisie.

Elayeb, B., Ben Romdhane, W., and Bellamine Ben Saoud,

N. (2017). Towards a new possibilistic query trans-

lation tool for cross-language information retrieval.

Multimedia Tools and Applications, pages 1–43.

ICAART 2018 - 10th International Conference on Agents and Artiﬁcial Intelligence

612

Elayeb, B. and Bounhas, I. (2016). Arabic Cross-Language

Information Retrieval: A Review. ACM Trans. Asian

Low-Resour. Lang. Inf. Process., 15(3):1–44.

Elayeb, B., Bounhas, I., Ben Khiroun, O., Evrard, F.,

and Bellamine Ben Saoud, N. (2015). A Compara-

tive Study Between Possibilistic and Probabilistic Ap-

proaches for Monolingual Word Sense Disambigua-

tion. Knowl. Inf. Syst., 44(1):91–126.

Elayeb, B., Bounhas, I., Ben Khiroun, O., Evrard, F., and

Bellamine-BenSaoud, N. (2011). Towards a possi-

bilistic information retrieval system using semantic

query expansion:. International Journal of Intelligent

Information Technologies, 7(4):1–25.

Elayeb, B., Bounhas, I., Khiroun, O. B., and Saoud, N. B. B.

(2014). Combining Semantic Query Disambiguation

and Expansion to Improve Intelligent Information Re-

trieval. In Duval, B., Herik, J. v. d., Loiseau, S., and

Filipe, J., editors, Agents and Artiﬁcial Intelligence,

number 8946 in Lecture Notes in Computer Science,

pages 280–295. Springer International Publishing.

Herbert, B., Szarvas, G., and Gurevych, I. (2011). Combin-

ing Query Translation Techniques to Improve Cross-

language Information Retrieval. In Proceedings of the

33rd European Conference on Advances in Informa-

tion Retrieval, ECIR’11, pages 712–715, Berlin, Hei-

delberg. Springer-Verlag.

Kim, S., Ko, Y., and Oard, D. W. (2015). Combining lexical

and statistical translation evidence for cross-language

information retrieval. Journal of the Association for

Information Science and Technology, 66(1):23–39.

Koehn, P. (2005). Europarl: A Parallel Corpus for Statistical

Machine Translation. In Proceedings of the 10th Ma-

chine Translation Summit, pages 79–86, Phuket, Thai-

land. AAMT.

Mihalcea, R. (2005). Unsupervised Large-vocabulary Word

Sense Disambiguation with Graph-based Algorithms

for Sequence Data Labeling. In Proceedings of

the Conference on Human Language Technology and

Empirical Methods in Natural Language Processing,

HLT ’05, pages 411–418, Stroudsburg, PA, USA. As-

sociation for Computational Linguistics.

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and

Miller, K. J. (1990). Introduction to WordNet: An

On-line Lexical Database. International Journal of

Lexicography, 3(4):235–244.

Navigli, R. and Lapata, M. (2010). An Experimental Study

of Graph Connectivity for Unsupervised Word Sense

Disambiguation. IEEE Trans. Pattern Anal. Mach. In-

tell., 32(4):678–692.

Navigli, R. and Ponzetto, S. P. (2012). BabelNet: The Au-

tomatic Construction, Evaluation and Application of a

Wide-coverage Multilingual Semantic Network. Artif.

Intell., 193:217–250.

Nie, J.-Y. (2010). Cross-language Information Retrieval.

Morgan & Claypool Publishers.

Och, F. J. and Ney, H. (2003). A Systematic Comparison of

Various Statistical Alignment Models. Comput. Lin-

guist., 29(1):19–51.

Ounis, I., Lioma, C., Macdonald, C., and Plachouras, V.

(2007). Research directions in terrier: a search engine

for advanced retrieval on the web. CEPIS Upgrade

Journal, 8(1).

Paskalis, F. and Khodra, M. (2011). Word sense disam-

biguation in information retrieval using query expan-

sion. In 2011 International Conference on Electrical

Engineering and Informatics (ICEEI), pages 1–6.

Resnik, P. (2004). Exploiting Hidden Meanings: Using

Bilingual Text for Monolingual Annotation. In Com-

putational Linguistics and Intelligent Text Processing,

Lecture Notes in Computer Science, pages 283–299.

Springer, Berlin, Heidelberg.

Rocchio, J. (1971). Relevance Feedback in Information

Retrieval. In The SMART Retrieval System, pages

313–323. Prentice-Hall, Englewood Cliffs, New Jer-

sey, USA.

Schamoni, S., Hieber, F., Sokolov, A., and Riezler, S.

(2014). Learning Translational and Knowledge-based

Similarities from Relevance Rankings for Cross-

Language Retrieval. In Proceedings of the 52nd An-

nual Meeting of the Association for Computational

Linguistics, ACL 2014, volume 2, pages 488–494,

Baltimore, MD, USA.

Silberer, C. and Ponzetto, S. P. (2010). UHD: Cross-Lingual

Word Sense Disambiguation Using Multilingual Co-

Occurrence Graphs. In Erk, K. and Strapparava, C.,

editors, Proceedings of the 5th International Work-

shop on Semantic Evaluation, SemEval@ACL 2010,

Uppsala University, Uppsala, Sweden, July 15-16,

2010, pages 134–137. The Association for Computer

Linguistics.

ure, F. and Boschee, E. (2014). Learning to Translate:

A Query-Speciﬁc Combination Approach for Cross-

Lingual Information Retrieval. In Moschitti, A., Pang,

B., and Daelemans, W., editors, Proceedings of the

2014 Conference on Empirical Methods in Natural

Language Processing, EMNLP 2014, pages 589–599.

ACL.

eronis, J. (2004). HyperLex: lexical cartography for in-

formation retrieval. Computer Speech & Language,

18(3):223–252.

Zadeh, L. (1978). Fuzzy sets as a basis for a theory of pos-

sibility. Fuzzy Sets and Systems, 1(1):3–28.

Zhou, D., Truran, M., Brailsford, T., Wade, V., and Ash-

man, H. (2012). Translation techniques in cross-

language information retrieval. ACM Comput. Surv.,

45(1):1:1–1:44.

Towards a Query Translation Disambiguation Approach using Possibility Theory

613