IT’S ANSWER TIME

Taking the Next Step in Question-Answering

Ana Cristina Mendes

F/INESC-ID, IST Technical University of Lisbon, Lisbon, Portugal

Lu´ısa Coheur

F/INESC-ID, IST Technical University of Lisbon, Lisbon, Portugal

Keywords:

Answering, Cooperative answer, Correct, useful, Non-misleading, Answer generation.

Abstract:

After all the work done in tasks like question classiﬁcation, query expansion or information extraction in QA,

we consider that some efforts should now be put specially on giving the answer to the user. In this paper we

adopt the concept of cooperative answer – that is, a correct, useful and non-misleading answer – since it is our

opinion that ﬁnding and presenting the cooperative answer to the user is one of the next challenges in QA. With

that goal in mind, we focus on three main aspects that should deserve the attention of the QA community: the

ability of systems to relate the candidate answers for a question; their ability to decide which candidates are

possible ﬁnal answers, given the question, but also the user who posed it; and, ﬁnally, the ability of generating

the ﬁnal answer in a cooperative way.

1 INTRODUCTION

Actual research on Question-Answering (QA) put

much effort in tasks such as question classiﬁca-

tion (Li and Roth, 2002; Huang et al., 2008), query

expansion (Bilotti et al., 2004; Derczynski et al.,

2008) or information extraction from knowledge

sources (Hovy et al., 2001; Frank et al., 2007). In

what concerns answering, the used strategy involves

the selection of the ﬁnal answer from a list of ex-

tracted candidates from the involved texts. Usually,

the ﬁnal answer is the one with the higher score,

which is typically calculated as a function of the can-

didate’s frequency. As answering is a rather important

phase in QA, we consider that this task needs urgent

attention. In this paper we focus on three main re-

search aspects that should be enhanced by QA com-

munity:

• Relating Answers: QA systems usually base the

process of recovering answers on redundancy (of

the Web, or of document collections, even if in a

smaller scale), that allows to directly extract an-

swers based on the assumption that every infor-

mation item is likely to have been stated mul-

tiple times, in multiple ways, in multiple docu-

ments (Lin, 2007). However, this redundancy-

based strategy will only work if, in a ﬁrst pace,

a way of relating answers is devised. Some sys-

tems perform this step, as it is the case of the work

by (Moriceau, 2005): prior to the selection phase,

extracted answers are standardized. Nevertheless,

a major challenge in QA research is still to be able

to identify and relate groups of answers.

• Targeting Cooperative instead of Correct An-

swers: current QA systems search after the an-

swer that correctly solves the input question. In-

deed, this characteristic marks a great difference

between QA and Information Retrieval (IR) tasks:

in the former the user does not need to ﬁlter and

choose the answer from a set of possible answers,

since the system has the ability, and responsibil-

ity, for doing that work for him. However, provid-

ing a correct answer is not enough, like (Gaaster-

land et al., 1992) concluded in their work on co-

operative answering. Time has passed, research

has progressed and systems have evolved, but the

premises remain unchanged: better than a correct

answer, is a cooperative answer, that is, a correct,

non- misleading and useful answer. Nevertheless,

the way how the previous three properties inter-

connect is far from trivial. A correct answer might

not be useful, like an useful answer might not be

224

Cristina Mendes A. and Coheur L..

IT’S ANSWER TIME - Taking the Next Step in Question-Answering.

DOI: 10.5220/0003141602240229

In Proceedings of the 3rd International Conference on Agents and Artiﬁcial Intelligence (ICAART-2011), pages 224-229

ISBN: 978-989-8425-40-9

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

correct. Consider the question: “How many kilo-

bytes are there in a gigabyte?”. For an IT student

the useful answer would be the correct and exact

one: “1,048,576 KB”; however, for someone who

wants to have an idea of the magnitude of that

amount, an useful answer could be “1,048,000

KB”, which is not correct. Here, the introduction

of the adverb around could transform it in a cor-

rect and useful answer. Notice, too, that a correct

answer can be misleading. Regard the question:

“How many Chechens did Stalin deport?” and

two possible answers “more than 100,000” and

“more than 100”. Despite the correctness of both,

the latter gives the user a wrong idea of greatness.

If QA aims at being considered a valid and appeal-

ing alternative to IR (which has been adapting the

retrieved results to the user), systems are required

to follow this path. Even if ﬁnding correct an-

swers is, by itself, a very hard task, which has cer-

tainly not yet seen its end, it does not sufﬁce. Re-

search must move forward in the direction of pro-

viding the right answer to the right user. YourQA

took the ﬁrst steps in this direction (Quarteroni

and Manandhar, 2007), however the user has to

ﬁrst create his own proﬁle.

• Generating Answers: another aspect has to do

with the formulation of the answer. The task is

to create a valid and coherent answer, stated in

natural language, if necessary. Directly related

with this topic, one can not disregard the amount

of work done in Natural Language Generation

(NLG) (Reiter and Dale, 2000). Nevertheless, and

despite the deep interest on many other sub-tasks

of QA systems, answer generation is still a topic

to be grubbed: the usual strategy is to return di-

rectly what was extracted. (Moriceau, 2005) is

an exception, with an approach that computes and

lexicalizes the answer to be given to the user.

This paper is organized as follows: in Section 2

we present a typology of relations and survey work

done in this direction. In Section 3 we brieﬂy survey

some of the community efforts towards cooperative

answers and in Section 4 we focus on the problematic

of answer generation. The paper concludes in Sec-

tion 5.

2 RELATING ANSWERS

Relating candidate answers is a challenge to be solved

by QA systems. Here we present a typology of

relations that connect answers, which we derived

from the analysis of corpora with questions and an-

swers (Magnini et al., 2003; Magnini et al., 2005),

and based on the terminology described in (Moriceau,

2006), ﬁrstly proposed by (Webber et al., 2002).

Moreover, we discuss how groups of answers can be

identiﬁed.

2.1 A Typology of Relations

Relations between candidate answers can be of:

EQUIVALENCE: if answers are consistent and entail

mutually, namely:

1) answers with notational variations. For in-

stance, “Oct. 14, 1947” and “14th October, 1947”

are equivalent answers for “When did the test pilot

Chuck Yeager break the sound barrier?”;

2) answers that rephrase others, from synonyms

to paraphrases. The question “How did Jack Unter-

weger die?” can be answered with “committed sui-

cide” or “killed himself”;

INCLUSION: if answers are consistent and differ in

speciﬁcity, one entailing the other, through:

1) hypernymy: “What animal is Kermit?” can be

answered with “frog” or “amphibian”; or

2) meronymy: for instance, “Where did Ayrton

Senna have the accident that caused his death?”, in

which “Imola”, “Italy”, “Europe” and “earth” are

possible answers; or

3) membership: for instance, “Edvard Munch”

and “a Norwegian Symbolist painter” are both pos-

sible answers to “Who painted the “Scream”?”

ALTERNATIVE: if answers are not entailing:

1) representing distinct and complementary vi-

sions of the same entity. For example, “the largest

wind energy park of Hessia”, and “six wind turbines

with a power of 1350 kilowatts” are two complemen-

tary answers for “What is set up in Seibertenrod in

Vogelsberg?”; or,

2) representing different entities, and can be used

together by means of a conjunction. “Aribert Heim”,

“Josef Mengele” and “Jack Kevorkian” are answers

for “What is the real name of Dr. Death?”, used sep-

arately or in a conjunction.

CONTRADICTORY: if answers are inconsistent and

their conjunction is invalid. The question “How is the

Pope?” can be answered with “ill” or “healthy”, but

not with both.

2.2 Identifying Groups of Answers

String distance metrics – like the Levenshtein dis-

tance, the cosine or Jaccard similarities – can be used

to relate equivalent answers: “John Kennedy” and

“John F. Kennedy” are equivalent, as they (most of

IT'S ANSWER TIME - Taking the Next Step in Question-Answering

225

the times) refer to the same person. However, this

approach has some limitations, since it can not be

always directly applied. For example, in “George

Bush” and “George W. Bush”, it depends on the con-

text whether they refer to the same person or to two

different persons.

Normalization is another strategy to encounter

equivalence relations, aiming to ﬁnd canonical unam-

biguous referent for entities. (Moriceau, 2005) deals

with diversity in candidate answers, and presents an

approach to deliver single coherent answers to

date

questions. The same author (Moriceau, 2006) de-

scribed a method to deal with variation in numer-

ical answers, in which the use of frames contain-

ing all information related with numerical values al-

low comparisons between answers. Other experi-

ments (Khalid et al., 2008) in named entity normal-

ization have shown that it helps text retrieval for QA.

Normalization is typically done prior to the answer

extraction phase, and does not aim at connecting an-

swers through equivalence after they were extracted.

Like in paraphrase detection, it is used to permit the

extraction of diverse candidate answers referring to

the same entity (Takahashi et al., 2003; France et al.,

2003).

Inclusion and equivalence relations can be built by

using the lexical relations present in Wordnet

. This

approach was used by (Dalmas and Webber, 2007),

who propose a technique to organize answer candi-

dates on the geographical domain into clusters. An-

swers’ models are created from questions and their

candidates, and represented as direct graphs express-

ing the fusion of information contained in the set of

extractions. The ﬁnal answer is retrieved based on

the computation of properties that compare the graphs

nodes.

The detection of alternative and contradictory re-

lations require different procedures. The former deals

with discovering if two answers point to the same en-

tity; but, in contrary to equivalence, inference based

on lexical relations is not enough. In the latter, several

notions should be taken into account, namely: if the

answers are antonyms (Mohammad et al., 2008); the

quality and trustworthiness of the document in which

the answer was found (Oh et al., 2009), the time pe-

riod answers refer to (specially important if they are

searched in the Web).

http://wordnet.princeton.edu/

3 COOPERATIVE ANSWER

VERSUS CORRECT ANSWER

In order to choose the cooperative answer to a given

question, the procedure should be taken further than

uniquely discovering which candidates are correct.

This decision depends, at great extent, on the one

who will get the answer: the user. Here we discuss

the properties that make an answer cooperative: be-

ing correct, non-misleading and useful.

3.1 Correct Answers

The correctness of an answer relates, ﬁrstly, with

whether it is associated with the entity that the ques-

tion is seeking after. For instance, the question “What

animal is Kermit?” asks for the character in The

Muppets Show, and not the computer protocol. The

focus is, thus, to identify which answers hold contra-

dictory relations and, from those, to decide which to

choose. Most QA systems consider this as a main and

ﬁnal goal: to select the correct answer among all can-

didates.

3.2 Non-misleading Answers

A non-misleading answer avoids the user to create a

wrong interpretation about the topic under considera-

tion.

Here we focus on three problems that can mislead

the user when dealing with an open-domain QA sys-

tem over large collections of text: 1) answer ambigu-

ity, 2) answer granularity, and 3) answer absence.

• Ambiguity, as it is usually considered in QA,

arises either from corpora sources or user ques-

tions. Systems that do try to cope with ambigu-

ity in the user’s question, usually push its resolu-

tion to the user side, through the use of clariﬁca-

tion dialogues. Ambiguity in answers, however,

is usually not addressed. Returning an ambigu-

ous answer to a question is not cooperative, since

it leaves room to multiple interpretations. For in-

stance, answering Bush to the question “Who was

the President of the US during the Gulf War?” is

ambiguous since there were two presidents of the

United States named Bush, and the answer does

not clearly state each one responds to the ques-

tion.

• The problem with answer granularity resides in

the fact that, in many situations, the decision

about which answer to choose for a question is

fuzzier: also among human assessors there is no

agreement about what is the answer to a given

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

226

question. (Lin and Katz, 2006) mention that gran-

ularity is a critical point specially if the answer be-

longs to the types PERSON, LOCATION or DATE.

Till a certain point, the granularity of the cooper-

ative answer depends on the question, and on sev-

eral other factors external to the questioner (like

its position in space (Shanon, 1979) and, by anal-

ogy, in time) and can thus be controlled with re-

course to rules and guidelines. On the other hand,

it is certainly not independent from him, his char-

acteristics and what he expects to have as answer.

As (Lin and Katz, 2006) point out, the granular-

ity has much to do with real users: ’better un-

derstanding of real-world user needs will lead to

more effective question answering systems in the

future’.

• Answer absence has to do with responding to the

user when no answer was found. Indeed, and al-

beit a wrong answer contradicts the goal of QA,

retrieving no answer is not better: it does not bring

any valuable information to the user, and can lead

him/her to misinterpretations. Besides originating

on the side of the system – which could not ﬁnd

an answer within the available corpora – ques-

tions with no answer can arise from the user side,

namely from false presuppositions. Consider the

question “Who is the King of France?”. Know-

ing that France is a republic, answering “no one”

(or, even worse, “NIL”) can drive the user to think

the system was unable to ﬁnd the answer. In this

case, an explanation is due. (Benamara, 2004) re-

ports WEBCOOP, a restricted domain logic-based

system that integrates knowledge representation

and advanced reasoning to detect false presuppo-

sitions and misunderstandings in questions, in or-

der to deliver non-misleading answers. To our

knowledge, there is no open domain QA system

that deals with this problematic.

3.3 Useful Answers

Choosing the useful answer is a task deeply in-

tertwined with the characteristics of the questioner.

There is, thus, the need for acknowledging the user

and his/her goals. We consider that it can be achieved

through the recognition of the clues the user provides

when interacting with the system, in three different

levels: question, context and history of interactions.

Question Clues: The ﬁrst level in ﬁnding clues for

deciding which answer is useful is to analyze the

question and how it was posed to the system. Sev-

eral clues in the question can be checked, namely (and

surely not limited to):

1) the vocabulary, that differs depending on the

user. For instance, on his/her age, academic back-

ground and occupation;

2) the speciﬁcity and world knowledge. If the user

asks: “Who received the Prince of Asturias Award for

Technical and Scientiﬁc Research for his studies on

the discovery of the ﬁrst synthetic vaccination against

malaria?”, probably he has deep knowledge on the

topic.

DuARTE Digital (Mendes et al., 2009) answers

questions about a piece of jewellery, and dynamically

tries to assess its interlocutor characteristics, based on

the used vocabulary. With a list of words that naive or

expert users might employ, it interprets, at every ques-

tion, the user’s expertise. Knowing this, it chooses

the answer (previously marked with the correspond-

ing difﬁculty level) from a knowledge base.

Clues in the Current Context: The next level has

to do with the current context. Consider a chain of

questions about the 2

World War. If the question

“What is the real name of Dr. Death?” appears,

probably the useful answer will be “Aribert Heim”

or “Josef Mengele” (or both) and not any other from

the set of possible answers

Again, DuARTE Digital (Mendes et al., 2009) is

a system that tries to acknowledge the user’s goals at

a contextual level. It measures the proximity of the

user’s words in a sequence of questions to different

sub-topics, in order to understand the orientation of

the interaction. By doing so, it distinguishes from fo-

cused to stray interactions, and chooses the answer

according to its detail and informative level.

Clues in the History of Interactions: The ﬁnal

level relates with the history of interactions between

user and system.

Although this is not a new issue in IR, specially

in Web search engines, where systems try to adapt

the presentation of results according to the user (Liu

et al., 2004; Teevan et al., 2005), the ﬁrst steps in

QA only recently have been taken. (Quarteroni and

Manandhar, 2009) were pioneers in this topic, as they

included on the system YourQA a component dedi-

cated to user modelling. The purpose is to ﬁlter the

documents where answers are searched and to rerank

the candidates based on the degree of match with the

user’s proﬁle. Users must create their own proﬁle

when ﬁrst interacting with the system (it does not dy-

namically discovers its interlocutor characteristics),

and their browsing history is taken in consideration

in future interactions. Any question submitted to the

Notice the multiple candidate answers for this question:

http://en.wikipedia.org/wiki/Dr. Death

IT'S ANSWER TIME - Taking the Next Step in Question-Answering

227

system is answered by taking the user’s proﬁle into

account.

4 ANSWER GENERATION

The simplest approach that can be envisioned is to re-

turn the most frequent candidate. Returning the com-

plete set of correct answers can also be an option, like

for “What is the real name of Dr. Death?”. On the

other hand, some answers can be incorporated on a

single one that subsumes the set. For instance, animal

can be chosen to answer the question “What is Ker-

mit?”, instead of frog or amphibian. Nevertheless,

here the point is common: these are answers based on

pure extraction, and not their generation.

The fact is, although much work as been devoted

to NLG, it seems that QA still could not beneﬁt from

the results achieved in this well-established ﬁeld.

Answer generation is preferable to answer extrac-

tion for the purpose of answering: ﬁrstly it humanizes

the system; second, it permits the usage of adapted

vocabulary; ﬁnally, it allows the introduction of in-

formation that the user did not explicitly request, but

might be interested in.

There are a few examples of works that try to

build answers, instead of merely extract and retrieve.

Again, (Moriceau, 2005)’s work in data integration

is a good contribution. The system generates nat-

ural language answers by making use of generation

schemas, and lexicalizes its degree of conﬁdence in

the answer with the use of adverbs and their inten-

sities (e.g., possibly, most possibly, probably, most

probably). Another example is WEBCOOP (Bena-

mara and Saint-Dizier, 2003), which relies on tem-

plates to report user misconceptions and display solu-

tions to help to the user, in natural language.

5 CONCLUSIONS

In this paper we introduced our position about what

we believe to be a desirable focus of research in QA:

the process of retrieving the answer to the user. We

presented three main aspects that should lead the re-

search in QA, namely: relating the candidate answers

for a given question and for that we proposed a ty-

pology of four relations; choosing the answer among

the candidates, rather than only the correct one; and,

generating the ﬁnal answer, by using the work already

done in NLG, instead of retrieving the extracted infor-

mation chunks.

There are a few systems that partly cope with

some of the presented problems. We made a brief sur-

vey on these systems. Nevertheless, to our knowledge

there is no open-domain QA that deals with each of

these as a whole, for the purpose of answering. We

consider that the goal is now to put them together,

and focus on QA systems and approaches that take the

problem of cooperative answering into consideration.

Systems should evolve in this direction, to become

more competitive and appealing to real end-users.

ACKNOWLEDGEMENTS

Ana Cristina Mendes is supported by a PhD fellow-

ship from Fundac¸˜ao para a Ciˆencia e a Tecnologia

(SFRH/BD/43487/2008).

REFERENCES

Benamara, F. (2004). Cooperative question answering in

restricted domains: the webcoop experiment. In Proc.

ACL Workshop on Question Answering in Restricted

Domains, pages 21–26.

Benamara, F. and Saint-Dizier, P. (2003). Dynamic gen-

eration of cooperative natural language responses. In

EWNLG03-ACL , Budapest, , pages 56–67. ACL.

Bilotti, M., Katz, B., and Lin, J. (2004). What works better

for question answering: Stemming or morphological

query expansion. In Proceedings of the Information

Retrieval for Question Answering (IR4QA) Workshop

at SIGIR 2004.

Dalmas, T. and Webber, B. L. (2007). Answer comparison

in automated question answering. Journal of Applied

Logic, 5(1):104–120.

Derczynski, L., Wang, J., Gaizauskas, R., and Greenwood,

M. (2008). A data driven approach to query expan-

sion in question answering. In Coling 2008: Proceed-

ings of the 2nd workshop on Information Retrieval

for Question Answering, pages 34–41. Association for

Computational Linguistics.

France, F. D., Yvon, F., and Collin, O. (2003). Learning

paraphrases to improve a question-answering system.

In Proc. 10th Conference of EACL Workshop Natural

Language Processing for Question-Answering.

Frank, A., Krieger, H.-U., Xu, F., Uszkoreit, H., Crysmann,

B., Jrg, B., and Schfer, U. (2007). Question answer-

ing from structured knowledge sources. Journal of

Applied Logic, 5(1):20 – 48. Questions and Answers:

Theoretical and Applied Perspectives.

Gaasterland, T., Godfrey, P., and Minker, J. (1992). An

overview of cooperative answering. Journal of Intel-

ligent Information Systems, 1(2):123–157.

Hovy, E., Hermjakob, U., and yew Lin, C. (2001). The use

of external knowledge in factoid qa. In In Proceedings

of the Tenth Text REtrieval Conference (TREC, pages

644–652.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

228

Huang, Z., Thint, M., and Qin, Z. (2008). Question clas-

siﬁcation using head words and their hypernyms. In

EMNLP, pages 927–936.

Khalid, M., Jijkoun, V., and de Rijke, M. (2008). The im-

pact of named entity normalization on information re-

trieval for question answering. In 30th European Con-

ference on Information Retrieval (ECIR 2008), pages

705–710. Springer, Springer.

Li, X. and Roth, D. (2002). Learning question classiﬁers.

In Proceedings of the 19th international conference

on Computational linguistics, pages 1–7, Morristown,

NJ, USA. Association for Computational Linguistics.

Lin, J. (2007). An exploration of the principles underlying

redundancy-based factoid question answering. ACM

Trans. Inf. Syst., 25(2):6.

Lin, J. and Katz, B. (2006). Building a reusable test col-

lection for question answering. J. Am. Soc. Inf. Sci.

Technol., 57(7):851–861.

Liu, F., Yu, C., and Meng, W. (2004). Personalized web

search for improving retrieval effectiveness. IEEE

Transactions on Knowledge and Data Engineering,

16(1):28–40.

Magnini, B., Romagnoli, S., Vallin, A., Herrera, J., Penas,

A., Peinado, V., Verdejo, F., de Rijke, M., and Vallin,

R. (2003). The multiple language question answering

track at clef 2003. In CLEF 2003. CLEF 2003 Work-

shop. Springer-Verlag.

Magnini, B., Vallin, R., Ayache, C., Erbach, G., Pe˜nas, A.,

Rijke, M. D., Rocha, P., Simov, K., and Sutcliffe, R.

(2005). Overview of the clef 2004 multilingual ques-

tion answering track. In Results of the CLEF 2004

Cross-Language System Evaluation Compaign, pages

371–391. Springer-Verlag, Berlin Hidelberg.

Mendes, A. C., Prada, R., and Coheur, L. (2009). Adapting

a virtual agent to users vocabulary and needs. In Proc.

9th International Conference IVA 2009, LNAI 5773.

Springer-Verlag.

Mohammad, S., Dorr, B., and Hirst, G. (2008). Computing

word-pair antonymy. In EMNLP ’08: Proceedings of

the Conference on Empirical Methods in Natural Lan-

guage Processing, pages 982–991, Morristown, NJ,

USA. Association for Computational Linguistics.

Moriceau, V. (2005). Answer generation with temporal data

integration. In Wilcock et al., G., editor, EWNLG, Ab-

erdeen, pages 197–202. University of Aberdeen.

Moriceau, V. (2006). Numerical data integration for cooper-

ative question-answering. In Knowledge and Reason-

ing for Language Processing (KRAQ), Trento, Italy,

pages 43–50. Association for Computational Linguis-

tics (ACL).

Oh, H.-J., Lee, C.-H., Yoon, Y.-C., and Jang, M.-G. (2009).

Question answering based on answer trustworthiness.

In Information Retrieval Technology, 5th Asia Infor-

mation Retrieval Symposium, AIRS 2009, Sapporo,

Japan, October 21-23, 2009. Proceedings, pages 310–

317.

Quarteroni, S. and Manandhar, S. (2007). User modelling

for personalized question answering. In AI*IA ’07:

Proc. 10th Congress of the Italian Association for Ar-

tiﬁcial Intelligence on AI*IA 2007, pages 386–397,

Berlin, Heidelberg. Springer-Verlag.

Quarteroni, S. and Manandhar, S. (2009). Designing an

interactive open-domain question answering system.

Natural Language Engineering, 15(1):73–95.

Reiter, E. and Dale, R. (2000). Building Natural Language

Generation Systems. Natural Language Processing.

Cambridge University Press.

Shanon, B. (1979). Where questions. In Proceedings of

the 17th annual meeting on Association for Compu-

tational Linguistics, pages 73–75, Morristown, NJ,

USA. Association for Computational Linguistics.

Takahashi, T., Nawata, K., Inui, K., and Matsumoto, Y.

(2003). Effects of structural matching and paraphras-

ing in question answering (special issue on text pro-

cessing for information access). IEICE transactions

on information and systems, 86(9):1677–1685.

Teevan, J., Dumais, S. T., and Horvitz, E. (2005). Per-

sonalizing search via automated analysis of interests

and activities. In SIGIR ’05: Proc. 28th annual in-

ternational ACM SIGIR conference on Research and

development in information retrieval, pages 449–456.

ACM.

Webber, B., Gardent, C., and Bos, J. (2002). Position state-

ment: Inference in question answering. In Proc. 3rd

international conference on language resources and

evaluation (LREC), pages 19–25.

IT'S ANSWER TIME - Taking the Next Step in Question-Answering

229