A Dialogue Manager for an Intelligent Mobile Robot

Marcelo Quinder

e, Lu

ıs Seabra Lopes and Ant

onio J. S. Teixeira

IEETA, Departamento de Electr

onica e Telecomunicac¸

oes

University of Aveiro

3810-193 Aveiro, Portugal

Abstract. This paper focuses on a dialogue manager developed for Carl, an intel-

ligent mobile robot. It uses the Information State (IS) approach and it is based on

a Knowledge Acquisition and Management (KAM) module that integrates infor-

mation obtained from various interlocutors. This mixed-initiative dialogue man-

ager handles pronoun resolution, it is capable of performing different kinds of

clariﬁcation questions and to comment information based on the current knowl-

edge acquired.

1 Introduction

Personal robots, intelligent service robots capable of performing useful work in close

cooperation/interaction with humans, are expected to be the next generation of robots.

“Integrated Intelligence” [10] identiﬁes an approach to building such agents in which

the integration of key aspects is considered, including linguistic communication, rea-

soning, reactivity and learning.

This is the scope of CARL (Communication, Action, Reasoning and Learning in

Robotics), a research project started in our institute in 1999, in the framework of which

a robot prototype was developed, Carl [9]. The software architecture of Carl, which is

based on the Open Agent Architecture (OAA) [6], uses a community of agents to handle

general perception and action, display appropriate emotions through an animated face,

process natural language and manage the robot.

Carl has been using a dialogue management approach based on ﬁnite state ma-

chines. In each predeﬁned state, state transitions specify which actions the robot is

supposed to execute under different conditions and which new states are reached. Sev-

eral speech acts are supported including declarations (tell), questions (ask, ask

if) and

commands (achieve). A Knowledge Acquisition and Management (KAM) module [11]

integrates information obtained from different interlocutors, even if they are contradic-

tory, and provides replies to received questions. The dialog management approach used

until now has several limitations. It follows a mostly single-initiative strategy, since

most dialogues are started by the user. On the other hand, it lacks capabilities for ambi-

guity resolution, clariﬁcation of misrecognized sentences and conﬁrmation of sentences

with low recognition conﬁdence.

This paper focuses on the dialogue manager now being developed to address these

limitations. It uses the Information State (IS) approach to dialogue systems [12]. The

approach allows for mixed initiative dialogs. It also supports the use of pronouns and

Quinderé M., Seabra Lopes L. and J. S. Teixeira A. (2007).

A Dialogue Manager for an Intelligent Mobile Robot.

In Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science, pages 191-200

DOI: 10.5220/0002431901910200

 SciTePress

generates clariﬁcation/conﬁrmation questions when the Automatic Speech Recognizer

(ASR) conﬁdence is low and/or the sentences are ungrammatical. Finally, it is able to

produce comments on the information just acquired and it is also able to give informa-

tive answers

The paper is structured as follows. Section 2 presents different types of dialogue

systems. Section 3 describes the developed dialogue manager. Section 4 presents an

evaluation. Section 5 concludes the paper with reference to future work.

2 Dialogue Systems

Dialogue Systems can be divided into the following types [4]:

1. Finite State Systems – A ﬁnite state machine represents the dialogue, which means

that every state transition has to be coded in the system. These transitions occur

when the user gives the information the system was waiting for, usually a short

phrase or even isolated words. Most Finite State Systems do not give much freedom

to the user because the answers have to be given in a preset order. Besides, the user

should not answer more than it was asked. As mentioned, our robot Carl also used

until now a ﬁnite state approach. Although there are no constraint in the order of

declarations/questions and the sentences can be quite complex, the approach is not

ﬂexible enough to address other problems, such as ambiguity, misrecognition and

low recognition conﬁdence.

2. Frame-Based Systems– These systems have frames with ﬁelds that need to be ﬁlled

in order to allow a database query. The user is free to give as many answers as he

wants, and the system is capable of handling that.

3. Advanced Systems – They are mixed-initiative, either the user or the system can

have the control of the conversation.

(a) Belief, Desire, Intention (BDI) Models – The majority of the dialogue systems

that use BDI models [2] are plan based. For instance, if an agent needs an

information, it can set a plan that includes asking something to get the missing

information. Analogously, an agent that hears a question can infer why it was

made.

(b) Markov Decision Process (MDP) – in order to use them, a model that deﬁnes

the behavior of the system is needed. For that, two methods can be used:

– adjust the state number and policies to the minimum, build a system that

explore the state space through random dialogues and then a model can be

built from the created corpus.

– develop a simulated user that interacts with the system a million times,

then the system can learn from the corpus

tion state, is kept and identiﬁes what happened to the dialogue and also bases

the dialogue manager decisions.

192

2.1 Some Dialogue Managers

Florence is a dialogue manager framework developed by AT&T [3]. This framework

was made to support the development of a Spoken Dialogue System (SDS) with mul-

tiple dialogue strategies, instead of focusing on a single strategy, e.g. call routing or

plan-based interaction.

Each dialogue can be guided by a different strategy, such as an Augmented Transi-

tion Network (ATN) strategy, which is a ﬁnite state machine extension, a clariﬁcation

strategy or a rule-based strategy. The usual is ATN, which acts on the input and on the

local context to control the interaction ﬂow.

ASIMO is a conversational service robot. Nakano et al [7] propose a two-layer model

for the behavior and dialogue planning in robots of that kind. They named their module

MEBDP (Multi-Expert-based Behavior and Dialogue Planning), which is divided in:

upper layer – a task planning layer responsible for decomposing a task into subtasks;

lower layer – an expert action selection layer that performs the subtasks using experts.

There are four types of experts: request understanding, information providing, physical

action planning and information obtaining dialogue experts

Jijo-2 is an ofﬁce service robot that is able to communicate and learn about its environ-

ment [1]. The dialogue is handled by a ﬁnite state machine and so the robot’s answer

depends on its state. There are ﬁve kinds of tasks: consult database, database update,

person identiﬁcation, navigation and people calling.

Authors point out that the dialogue system’s main problem is that the design of

the state transition network is ad hoc and very task dependent. Another problem is

the informal semantic representation of utterances, which makes it hard to extend the

system to other tasks.

3 Description of the Dialogue Manager Developed

3.1 Requirements

The robot Carl should be able to carry on a dialogue like the one shown in Table 1. In

order to do that, Carl’s dialogue manager should have at least the following character-

istics:

– Reactive – it should react appropriately to a user utterance

– Dialogue context – for reference resolution, for pronoun use and for selecting the

most suitable action

– Reference resolution – so far used for understanding pronouns

– Ellipse – for understanding incomplete sentences

– Pronoun clariﬁcation – when the reference resolution is not able to resolve one or

more pronouns

– Pronoun use – when speaking, it should use as much pronouns as it can to make

the dialogue smooth

193

– Clariﬁcation – when conﬁdence level of the ASR is low

– Comment – when receiving new information, it should make comments if possible

– Informative – when answering, it should be as much informative as it can, avoiding

just “no” answers

– Grounding – system should make clear that it understood the user

– Mixed initiative – both the user and the system should be able to have the initiative

of the dialogue

Table 1. Fictitious dialogue with robot Carl.

Speaker Utterance Requirement

User: Hi! dialogue context update

Carl: Hi! I’m the robot Carl. reactive

Carl: What’s your name? initiative

User: Jim. dialogue context update, ellipse

Carl: What can I do for you? give initiative

User: Is he in the institute? dialogue context update

Carl: Who do you mean? reactive, pronoun clariﬁcation

User: Peter. dialogue context update, ellipse

Carl: No, he is in the department. reactive, informative, dialogue context use, pronoun use

User: Does he like you? reference resolution, dialogue context use/update

Carl: Yes. reactive, informative

User: Mary lives in Spain. dialogue context update

Carl: Did you say that Mary lives in Spain? reactive, clariﬁcation

User: Yes. dialogue context update

Carl: Really? Daniel lives in Spain too! reactive, comment

3.2 Information State Approach

From all the types of dialogue systems shown, an Information State (IS) approach was

chosen to be used on the dialogue manager for being advanced, ﬂexible, and able to

address all the requirements mentioned before.

The developed dialogue manager is composed by:

– information state description – identiﬁcation of the IS components

– events – external occurrences that evoke an information state update

– IS update rules – deﬁne when and how to update the information state

– action selection – deﬁnes which action to perform next

– control module – controls which update rules are applied and selects the next action

Carl’s software architecture is composed of following agents: Graphical and Touch

Interface (GTI), Automatic Speech Recognizer(ASR), Natural Language Understand-

ing (NLU), Natural Language Generation (NLG), Synthesis, Navigation and Manager.

Figure 1 shows an overview of the dialogue manager. One can see that it commu-

nicates directly to the agents: Navigation, NLG, ASR and GTI, as well as the KAM

module. The IS includes ﬁelds to handle the NLU input, task managent, dialogue initia-

tive and turn, user information, referenced objects (referents), questions performed by

the system, events and robot control.

currently, grounding is done by explicit display of ASR output on the screen

194

Navigation

event

timeouts

events

actions

input_clarified

input_resolved

task_list

task_history

last_nlu_events

action_history

ask_history

referent_list

user_name

turn

initiative

user_presence

mode

ieeta

domain

hobbie

domain

others

domain

IS / domain

consult

ASR

command

KB consult

or update

Navigation

command

GTI

command

speech

act

ASR

NLU

GTINavigation NLG

control

select

update

GTI

event

NLU

event

KAM

module

Fig.1. Dialogue manager design overview.

Timeouts and events generated by the agents GTI, NLU and Navigation allow IS

updates. The control module applies update rules (which can add tasks to the task list)

and then select the next action to take (based on the current task list). Therefore, the

behavior of the system is: receive events, update information state, select action and

act.

Figure 2 shows the IS update ﬂow of the NLU event. If the ASR conﬁdence level

is too low (less than 30%), the information is rejected and the task reject handles it.

The semantic extraction can be shallow – performed by Tilburg Memory Based Learner

(TiMBL) or deep – performed by LCFlex (details in [8]). If it is a TiMBL analysis and

the ASR conﬁdence is above 30%, a clarify

timbl task is added.

If it is a LCFlex analysis and the ASR conﬁdence is between 30% and 70%, a clarify

task is added so the information can go to the ﬁeld input

clariﬁed. If the conﬁdence is

above 70%, the information goes directly.

When there is a valid information on the ﬁeld input

clariﬁed, pronoun resolution is

evoked. If all pronouns can be resolved, the information is put on the ﬁeld input resolved.

If there is no referent for a pronoun or the system has two good choices, the clar-

ify

pronouns task is added.

195

NLU event

input_clarified

input_resolved

task clarify

task clarify_pronouns

task clarify_timbl

30% < Confidence < 70%

analysis = lcflex

Confidence > 70%

analysis = lcflex

resolvednot resolved

task reject

Confidence < 30%

Confidence > 30%

analysis = timbl

Fig.2. Update ﬂow of NLU event.

3.3 Dialogue Manager Control Algorithm

The dialogue manager control is performed by the algorithm 1. After receiving the

events, the procedure perform

state update() is called, which applies all the update rules

that have the conditions satisﬁed. Note that the actions are performed while the system

has the turn and, of course, actually has an action to perform. The IS ﬁeld turn is updated

when the user or the system perform a speech act.

Since tasks in task

list are sorted by priority, the control algorithm uses the ﬁrst one

to select the next action by calling the procedure perform

action selection(Task). The

procedure act(NextAction) executes the action and updates the history.

One should note that the execution is ﬁnalized when the program state ﬁeld has a

stop value.

The rule updateReferentList (1) is an example of an update rule. It is applied when-

ever there is a valid value in the IS ﬁeld input

resolved. Its main effect is to update

the IS ﬁeld referent

list by calling the procedure update referent list with the current

referent list and the semantics just received.

Rule : updateReferentList

Conditions :

valid IS value(input resolved)

Effects :

InputValue ← get IS value(input resolved)

ReferentListValue ← get

IS value(referent list)

NewReferentList ← update

referent list(ReferentListValue, InputValue))

set

IS value(referent list, NewReferentList)

(1)

The procedure update

referent list is a simpliﬁed version of the algorithm described

in [5] because the robot is designed to handle sentences much more simpler than the

196

Algorithm 1: control().

begin

repeat

receive events()

perform

state update()

repeat

Task ← get IS list head(task list)

NextAction ← perform

action selection(Task)

if NextAction 6= null then

act(NextAction)

TurnValue ← get

IS value(turn)

until TurnValue = user ∨ NextAction = null

ProgramStateValue ← get

IS value(program state)

until ProgramStateValue = stop

end

ones addressed by the original algorithm. The main difference is the use of less salience

factor types and ﬁlters. To support the reference resolution, a list of the referents men-

tioned in the dialogue are updated by the algorithm 2.

Algorithm 2: update referent list(ReferentList, Semantics).

begin

foreach referent[i] in ReferentList do

if referent[i].age = 3 then

delete(ReferentList, referent[i])

else

referent[i].age = referent[i].age + 1

referent[i].salience = referent[i].salience/2

TempList ← extract referents(Semantics)

NewReferentList ← merge(ReferentList, TempList)

NewReferentList ← sort

by(NewReferentList, salience)

return NewReferentList

end

Each referent has a salience value associated, which is used to sort the list. These

values are reduced to the half every time a new sentence is evaluated. This is to give

priority to the recent referents. Other detail is that the list only keeps the referent men-

tioned in the last four sentences.

As a plan example, one can see the store info plan (2). Basically, it stores informa-

tion acquired from the user by calling the procedure kb

update of the KAM module.

Since this module support contradictory information (details in [11]), it needs the user

name to associate to every information it keeps, so the task get user name is the ﬁrst

step in this plan. After the information is stored, another procedure from the KAM

197

module is called, kb comment info. This one tries to generate a comment based on the

semantics given and on the current state of the knowledge base. If it succeeds, Status

is set to ok, otherwise, not

ok. If we do have a comment, the plan is to use pronouns if

we can (by calling replace names by pronouns) and then send a message to NLG agent

with the semantics of the comment. Otherwise, the message is just an acknowledgement

that the information was stored.

There is one task that generates questions to the user based on the current knowledge

acquired. This task is added by a IS update rule when the system has the initiative on

the dialogue.

Plan : store

info(+RecConf, +Semantics)

Operations :

task(get user name(UserName)),

action(kb

update(UserName, RecConf, Semantics)),

action(kb

comment info(Semantics, Comment, CommentConﬁdence, Status)),

then else(

Status = ok,

[action(replace

names by pronouns(UserName, ReferentList, Comment, PronounComment)),

oaa

action(nlg(comment, release-turn, , , PronounComment, CommentConﬁdence))],

[oaa

action(nlg(ack, release-turn, , , , ))]

)

(2)

4 First Results

A scenario was built with the purpose of checking if a dialogue like the one showed in

Table 1 could really be carried on by the dialogue manager. In that dialogue, the robot

showed that it already had acquired some knowledge, namely that: 1) Peter is in the

department, 2) Peter likes Carl, 3) Daniel lives in Spain. So for the test, these facts were

previously added to the knowledge base using the KAM module.

As the dialogue manager was just developed and is not fully integrated in the robot

system yet, the output of the spoken language understanding (SLU) was simulated in

this test. For each user utterance in the target dialogue a semantic representation was

manually created. An ASR conﬁdence of 75% was used, as well as a lcﬂex analysis

label. Except on the “Mary lives in Spain” representation, in which a 50% conﬁdence

was used in order to force a clariﬁcation case.

Figure 3 shows the messages exchanged by the “simulated SLU”, the dialogue man-

ager and the NLG. One can see that the dialogue was fully accomplished. In order to

improve the comprehension of those messages, the target dialogue is shown again in

Table 2, but this time with the respective agent and speech acts associated to each utter-

ance.

5 Conclusion and Future Work

The development of a dialogue manager for the intelligent mobile robot Carl was pre-

sented. It uses the Information State (IS) approach to dialogue systems and it is based

198

nlu(75, lcflex, sem(greet, _, _, _))

nlg(greet, keep-turn, _, _, _, _)

nlg(introduce, keep-turn, _, _, _, _)

nlg(ask_name, release-turn, _, _, _, _)

nlu(75, lcflex, sem(reply, _, _, [name_(_, jim)]))

Simulated SLU

nlg(offer_help, release-turn, _, _, _, _)

nlu(75, lcflex, sem(ask_if, _, _, [pronoun_(X, he), association_(be_in, X, Y), instance_(Y, institute)]))

nlg(ask_pronoun, release-turn, who, _, [pronoun_(_, he)], _)

nlu(75, lcflex, sem(reply, _, _, [name_(_, peter)]))

nlg(tell, keep-turn, _, _, no, _)

nlg(tell, release-turn, _, _, [pronoun_(1, he), association_(be_in, 1, 2), instance_(2, department)], 0.42)

nlu(75, lcflex, sem(ask_if, _, _, [pronoun_(X, he), association_(like, X, Y), pronoun_(Y, you)]))

nlg(tell, release-turn, _, _, yes, 0.42)

nlu(50, lcflex, sem(tell, _, _, [name_(X, mary), association_(live, X, Y), name_(Y, spain)]))

nlg(clarify_tell, release-turn, _, _, [name_(X, mary), association_(live, X, Y), name_(Y, spain)], _))

nlu(75, lcflex, sem(reply, _, _, [instance_(_, yes)]))

nlg(comment, release-turn, _, _, [name_(4, daniel), association_(live, 4, 5), name_(5, spain)], 0.42)

NLG

Dialogue Manager

Fig.3. Dialogue Manager Test.

Table 2. Fictitious dialogue with robot Carl.

Speaker Utterance Agent Speech Act

User0: Hi! NLU greet

Carl0: Hi! I’m the robot Carl. NLG greet, introduce

Carl1: What’s your name? NLG ask

name

User1: Jim. NLU tell

Carl2: What can I do for you? NLG offer

help

User2: Is he in the institute? NLU ask

Carl3: Who do you mean? NLG ask

pronoun

User3: Peter. NLU tell

Carl4: No, he is in the department. NLG tell, tell

User4: Does he like you? NLU ask

Carl5: Yes. NLG tell

User5: Mary lives in Spain. NLU tell

Carl6: Did you say that Mary lives in Spain? NLG clarify

tell

User6: Yes. NLU tell

Carl7: Really? Daniel lives in Spain too! NLG comment

on a Knowledge Acquisition and Management (KAM) module that integrates informa-

tion obtained from various interlocutors, even if they are contradictory. This mixed-

initiative dialogue manager handles pronoun resolution, it is capable of performing dif-

ferent kinds of clariﬁcation questions and to comment information based on the current

knowledge acquired.

199

Although the dialogue manager is not fully integrated in the robot system yet, a pre-

liminary evaluation has shown that it is indeed capable to carry out complex dialogues.

It should also be mentioned that it addresses all the requirements of a dialogue manager

to be used on the robot Carl.

Current work includes the adjustment of the NLU and NLG agents to support the

speech acts introduced by the new dialogue manager, which will allow the integration

of the dialogue manager itself on the robot system.

References

1. H. Asoh and et. al. Jijo-2: an ofﬁce robot that communicates and learns. IEEE Intell. Systems,

16(5):46–55, 2001.

2. Michael E. Bratman, David Israel, and Martha Pollack. Plans and resource-bounded practical

reasoning. In Robert Cummins and John L. Pollock, editors, Philosophy and AI: Essays at

the Interface, pages 1–22. The MIT Press, Cambridge, Massachusetts, 1991.

3. Giuseppe Di Fabbrizio and Charles Lewis. Florence: a dialogue manager framework for

spoken dialogue systems. In ICSLP 2004, 8th International Conference on Spoken Language

Processing, Jeju, Jeju Island, Korea, October 4-8 2004.

4. D. Jurafsky and J. H. Martin. Speech and Language Processing. to be published, 2nd edition,

2007.

5. S. Lappin and H. J. Leass. An algorithm for pronominal anaphora resolution. Comp Ling.,

20:535–561, 1994.

6. D. Martin, A. Cheyer, and D. Moran. The Open Agent Architecture: a Framework for Build-

ing Distributed Software Systems. App. Artif. Intelligence, 13:91–128, 1999.

7. M. Nakano, N. Kanda, and et al. A Two-Layer Model for Behavior and Dialogue Planning

in Conversational Service Robots. In IROS, pages 1542–1548, 2005.

8. M. Rodrigues, A. Teixeira, and L. Seabra Lopes. An Hybrid Approach for Spoken Natural

Language Understanding Applied to a Mobile Intelligent Robot. In NLUCS, pages 145–150,

Porto, Portugal, 2004.

9. L. Seabra Lopes. Carl: from situated activity to language level interaction and learning. In

IROS, volume 1, pages 890 – 896, Lausanne, Switzerland, 2002.

10. L. Seabra Lopes and J. H. Connell. Guest editors’ introduction: Semisentient robots– routes

to integrated intelligence. IEEE Intelligent Systems, 16(5):10–14, 2001.

11. L. Seabra Lopes, A. Teixeira, and M. Quinder

e. A Knowledge Representation and Reasoning

Module for a Dialog System in a Mobile Robot. In NLUCS, pages 172 – 177, Miami, USA,

2005.

12. D. Traum and S. Larsson. The Information State Approach to Dialogue Management. In

Jan van Kuppevelt and Ronnie Smith, editor, Current and New Directions in Discourse and

Dialogue. Kluwer, 2003.

200