Interactive Narration Requires Interaction and Emotion

A. Pauchet

, F. Rioult

, E. Chanoni

, Z. Ales

and O. S¸erban

INSA Rouen, LITIS, Rouen, France

Universit

e de Caen, Greyc, Caen, France

Universit

e de Rouen, Psy-NCA, Rouen, France

Keywords:

Dialogue Modelling, Knowledge Extraction, Narrative Embodied Conversational Agent.

Abstract:

This paper shows how interaction is essential for storytelling with a child. A corpus of narrative dialogues

between parents and their children was coded with a mentalist grid. The results of two modelling methods

were analysed by an expert in parent-child dialogue analysis. The extraction of dialogue patterns reveals

regularities explaining the character’s emotion. Results showed that the most efﬁcient models contain at least

one request for attention and/or emotion.

1 INTRODUCTION

Designing a dialogue model is a difﬁcult and of-

ten multidisciplinary task. It involves many algo-

rithms: multi-modal inputs, natural language under-

standing and generation, dialogue management, emo-

tion, ... In particular, multi-modal and affective di-

alogue management remains inefﬁcient in Embodied

Conversational Agents (ECA) (Cassell et al., 2000),

even though this aspect is essential for interaction

(Swartout et al., 2006). With the emergence of partic-

ipative digital storytelling systems, child - humanoid

agent interaction situations are increasing. The dia-

logue model embedded into an ECA, when dedicated

to interactive storytelling, should be designed accord-

ing to the child’s socio-cognitive and language skills.

We aim at designing a method for modelling the

dialogue between parents and children and tools to

ease the extraction of regularities from interactive di-

alogues. We therefore propose hints to guide an inter-

active narrative session as well as dialogue patterns

extracted from the corpora as a model of dialogue for

narrative ECAs.

2 METHOD AND MATERIAL

The method proposed to model the dialogue is de-

scribed in (Ales et al., 2012). It consists in: 1) col-

lecting and digitizing a corpus of dialogues; 2) the

transcription and coding step produces raw data with

various levels of details (speaking slots, utterances,

onomatopoeia, pauses, ...) depending on the phenom-

ena and characteristics which the model must exhibit;

3) knowledge and regularity extractions are applied,

through the utterances coded during the previous step.

This knowledge consists in the regularities of the dia-

logues and constitutes the model; 4) ﬁnally, the model

is exploited for interactive storytelling.

Corpus of Narrative Dialogues

Modelling storytelling dialogues requires knowledge

extraction from a corpus of non mediated parent-

child storytelling dialogues. In this study, 30 dia-

logues between children and parents (ages: 3, 4 and

5) were recorded during emotional story telling situa-

tions. These records have been transcribed and coded

with a mentalist grid (Chanoni, 2009) to capture in-

formation about the mental states (beliefs, emotions,

...) contained by the various utterances.

Matrix Representation of Dialogues

As outlined by Bunt, dialogue management involves

multilevel aspects (Bunt, 2011). To design a dia-

logue model that supports multidimensionality, a ma-

trix representation is chosen for annotations. Each

utterance is characterised by an annotation vector,

whose components match the different coding dimen-

sions. A dialogue is represented by a matrix: one row

by utterance, one column by coding dimension.

To illustrate this two-dimensional representation,

Table 1 presents an example of encoded parent-child

dialogue, extracted from the collected corpus. Each

utterance is characterised by a line number, a speaker

(P: parent, C: child), a transcription and its encoding

527

Pauchet A., Rioult F., Chanoni E., Ales Z. and ¸Serban O..

Interactive Narration Requires Interaction and Emotion.

DOI: 10.5220/0004259605270530

In Proceedings of the 5th International Conference on Agents and Artiﬁcial Intelligence (ICAART-2013), pages 527-530

ISBN: 978-989-8565-39-6

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

Table 1: 2D annotations of a narrative dialogue between a parent and his/her child.

Line Speaker Utterance Annotations

31 P Who could have stolen the crown? Q - F - -

32 C The crown, it’s in! A - F - -

33 P Do you believe it? Q H K - -

34 C Yes A - F - -

35 P But Babar doesn’t know that it’s in A P N O J

36 P So he says that the crown is a bomb A P N C J

annotations.

The annotation grid corresponds to the follow-

ing coding scheme: Column 1: an (A)fﬁrmation, a

(Q)uery, a request for paying attention to the story

(D), or a demand for general attention (G). Column 2:

reference of the utterance. It can refer to the character

(P), the interlocutor (H) or the speaker (R). Column

3: an (E)motion, a (V)olition, an observable or a non-

observable cognition (B or N), an epistemic statement

(K), an assumption (Y) or a (S)urprise. The surprise is

distinguished from other emotions because of its link

with the incidental belief. Columns 4 and 5: expla-

nations with cause / (C)onsequence, (O)pposition or

empathy (M), which can be applied either to explain

the story (J), or to precise a situation with a personal

context (F).

3 KNOWLEDGE EXTRACTION

Dialogue Pattern Extraction

With our matrix representation, a dialogue pattern is

deﬁned as a set of annotations which occurs in several

dialogues. The method designed to extract signiﬁcant

dialogue patterns consists in a regularity extraction

step based on matrix alignment using dynamic pro-

gramming and a clustering step using machine learn-

ing heuristics to group and select the recurrent dia-

logue patterns. The clustering process is applied on

a similarity graph computed during the matrix align-

ment.

The method for extracting two-dimensional pat-

terns is a generalisation of the local string edition dis-

tance. The edit distance between two string S

and

corresponds to the minimal cost of the three ele-

mentary edit operations (insertion, deletion and sub-

stitution of characters) for converting S

to S

. Two-

dimensional pattern extraction corresponds to matrix

alignment. A local alignment of two matrices M

and

, of size m

× n

and m

× n

respectively, consists

in ﬁnding the portion of M

and M

which are the

most similar. To this end, a four-dimensional matrix

T of size (m

+ 1) × (n

+ 1) × (m

+ 1) × (n

+ 1)

is computed, such that T [i][ j][k][l] is equal to the lo-

cal edition distance between S

[0..i − 1][0.. j − 1] and

[0..k −1][0..l −1] for all i ∈ J1, m

−1K, j ∈ J1, n

−

1K, k ∈ J1, m

− 1K and l ∈ J1,n

− 1K. In our heuris-

tic, the calculation of T is obtained by the minimisa-

tion of a recurrence formula. Once T is computed, the

best local alignment is found from the position of the

maximal value in T , through a trace-back algorithm

to infer the characters which are part of the align-

ment. Figure 1, commented in Section 4, presents

an example of alignment extracted from the corpus.

Details about the two dimensional pattern extraction

algorithm can be found in (Lecroq et al., 2012).

The matrix alignment algorithm extracts the pat-

terns in pairs. To determine the importance of each

pattern, we group them using various standard cluster-

ing heuristics The idea is that large clusters of patterns

represent behaviours which are commonly adopted

by humans, whereas small clusters tend to contain

marginal patterns. A matrix of similarities between

patterns is computed through a global edition distance

applied on all pairs of selected patterns. This similar-

ity matrix is used as input for the clustering heuristics.

The method has been tested on the corpus of nar-

rative dialogues. During the extraction phase, 1740

dialogue patterns have been collected.

Predicting the Interaction of the Child

As our goal is to build a dialogue model dedicated to

narrative ECAs that stimulates child interaction, we

have to model the arising of the child’s interaction,

focusing on event prediction. In other words, we look

for sequences of dialogue events leading to child’s in-

teraction. We split the data over each turn of utter-

ance, in other words over each sequence of parents as-

sertion or question and child’s interaction. The prob-

lem consists therefore in predicting the end of each

turn.

The matrices that encode dialogues are considered

as sequences of features, each sequence ending with

the child’s interaction. For instance, the sequences

corresponding to Table 1, are: < (QF) >, < (QHK) >

, < (APNOJ)(APNOJ) >

The algorithm, without candidate enumeration,

mines the episodes with recursive projections, in a

greedy manner. The combinatorial explosion is lim-

ited with two anti-monotone constraints: the support

of the currently computed episode (the number of se-

ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence

528

quences it appears in) and the average distances (in

utterances) to the end of the sequences supporting it.

As the mining process is directed from the end of

the sequences to their beginning, not all the computed

episodes are relevant for the prediction of the end.

For instance, if each sequence begins and ends with

a (Q)uestion, the above algorithm will give < (Q) >

as an end predictor, while it is also a good predictor

for the beginning. To avoid these bad cases, the aver-

age length of each computed episode has to be taken

into account. If it is too small, the episode is not kept.

This process ensures that the mined regularities are

relevant for the prediction of the child’s interaction.

4 ANALYSIS OF THE MODELS

Patterns of Dialogue

Figure 1 presents an example of one pattern align-

ment appearing in two dialogues. This pattern shows

that parents ﬁrstly talked about the cause or the con-

sequence of the character’s behaviour (P, C, J), with-

out reference to any mental state. After assertions or

descriptive questions, parents insisted on the justiﬁ-

cation of the character’s behaviour (line 6), directly

with relation to the emotional state of the character

(line 7). Finally, the parent checked if the child under-

stood, with asking questions or by requesting his/her

attention (line 8).

Figure 1: Example of dialogue pattern alignment.

This pattern perfectly demonstrates that an emo-

tion cannot only be named to be explained. The pat-

tern described the link between the character’s be-

haviour and the mental state, the later explaining the

former.

Prediction of Interaction

Table 2 sums up the models for the child’s interac-

tions. For each age, the models are characterised by

the efﬁciency (average number of sentences between

the model and the child’s interaction. The more a se-

quence is efﬁcient, the fewer sentences before child’s

interaction) and the support (the percentage of times

the model appears).

We highlight the important following facts : 1)

Regardless of the age, sequences with justiﬁca-

tions were frequently associated with various indexes

(emotion, request for attention or question). The

child’s interaction came after 3.1 to 4.3 sentences af-

ter the model. 2) the quickness of the interaction de-

creased with the age, from 3.2 to 1.9 sentences. 3) the

number of sequences with emotion is merely equiv-

alent for all ages. Nevertheless, the older the child

is, the more different the sequences of emotion are.

Complex sequences (emotion and justiﬁcation: J-E or

E-J) appear only with the oldest children. 4) except

with request for attention, the most efﬁcient models

(in red in Table 2) contained emotion (E-Q or E-E).

5 DIALOGUE MODELLING

FOR NARRATIVE ECAS

ECAs are autonomous and anthropomorphic ani-

mated characters with multi-modal communication

skills (Cassell et al., 2000). Greta (Pelachaud, 2009)

and the European project SEMAINE (Schroder,

2010) provide good examples of current capabili-

ties of ECAs. With regard to dialogue systems and

models, which could be integrated into ECAs, sev-

eral approaches exist. The ﬁnite-state approach (ex:

(McTear, 2004)) which represents the structure of the

dialogue as a ﬁnite-state automaton where each ut-

terance leads to a new state. The frame-based ap-

proach represents the dialogue as a process of ﬁlling

in a form containing slots (ex: (Aust et al., 1995)).

The plan-based approach (Allen and Perrault, 1980)

combines plan recognition with the Speech Act the-

ory (Searle, 1969). The logic-based approach repre-

sents the dialogue and its context in some logical for-

malism (ex: (Hulstijn, 2000)). Finally, the machine

learning approach proposes techniques such as rein-

forcement learning (Frampton and Lemon, 2009) to

model the dialogue with Markov Decision Processes.

Most of the existing ECAs only integrates basic

dialogue management processes, such as a keyword

spotter within a ﬁnite-state or a frame-based approach

(ex: the SEMAINE project (Schroder, 2010)). They

uses regular structures that can only represent linear

interaction patterns, whereas dialogue management

involves multi-dimensional levels (Bunt, 2011). The

model we propose combines planning management

for the task resolution (prediction of the child interac-

tion) and a more reactive management when dealing

with dialogical conventions (dialogue patterns), all

along multidimensionality management through the

matrix coding of dialogues.

InteractiveNarrationRequiresInteractionandEmotion

529

Figure 2: Efﬁciency and quantity of all sequences for each age.

6 CONCLUSIONS

We have presented a methodology and tools designed

to improve dialogue models that could be integrated

in narrative ECAs. The proposed methodology con-

sists in extracting dialogue patterns, clustering to en-

code dialogical conventions and an event prediction

approach to plan the interactions of the listener. A

matrix representation of interactions is used to encode

the multidimensional aspects of dialogues. The algo-

rithms were applied to a corpus of child-parent inter-

actions during narration. Finally, we have shown why

interactive narration requires prominent interactions

and emotion.

REFERENCES

Ales, Z., Duplessis, G. D., ¸Serban, O., and Pauchet, A.

(2012). A methodology to design human-like embod-

ied conversational agents based on dialogue analysis.

In Proc. of HAIDM@AAMAS, pages 34–49, Valencia,

Spain.

Allen, J. and Perrault, C. (1980). Analyzing intention in

utterances. AI magazine, 15(3):143–178.

Aust, H., Oerder, M., Seide, F., and Steinbiss, V. (1995).

The philips automatic train timetable information sys-

tem. Speech Communication, 17(3-4):249–262.

Bunt, H. (2011). Multifunctionality in dialogue. Computer

Speech and Language, 25(2):222–245.

Cassell, J., Bickmore, T., Campbell, L., Vilhj

almsson, H.,

and Yan, H. (2000). Embodied conversational agents.

chapter Human conversation as a system framework:

designing embodied conversational agents, pages 29–

63. MIT Press.

Chanoni, E. (2009). Comment les m

eres racontent une his-

toire de fausses croyances

a leur enfant de 3

a 5 ans ?

Number 2, pages 181–189.

Frampton, M. and Lemon, O. (2009). Recent research

advances in reinforcement learning in spoken di-

alogue systems. Knowledge Engineering Review,

24(04):375–408.

Hulstijn, J. (2000). Dialogue games are recipes for joint

action. In Proc. of Gotalog’00.

Lecroq, T., Pauchet, A., and Solano, E. C. E. G. A. (2012).

Pattern discovery in annotated dialogues using dy-

namic programming. In IJIIDS, volume 6, pages 603–

618.

McTear, M. (2004). Spoken dialogue technology: toward

the conversational user interface. Springer-Verlag

New York Inc.

Pelachaud, C. (2009). Modelling multimodal expression of

emotion in a virtual agent. Philosophical Trans. of the

Royal Society B: Biological Sciences, 364(1535).

Schroder, M. (2010). The SEMAINE API: towards

a standards-based framework for building emotion-

oriented systems. Advances in HCI, 2010:2–2.

Searle, J. (1969). Speech Acts: An Essay in the Philosophy

of Language. Cambridge University.

Swartout, W. R., Gratch, J., Jr., R. W. H., Hovy, E. H.,

Marsella, S., Rickel, J., and Traum, D. R. (2006). To-

ward virtual humans. AI Magazine, 27(2):96–108.

ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence

530