A Survey of Socio-Emotional Strategies for Generation-Based

Conversational Agents

Lorraine Vanel

1,2

, Alya Yacoubi

and Chlo

e Clavel

LTCI, Telecom-Paris, Institut Polytechnique de Paris, France

Zaion, Paris, France

Keywords:

Task-Oriented Dialogue Systems, Social Dialogue, Emotional Dialogue, Persona, Empathetic Conversational

System, Conversational Dataset, Neural Generation Models, End-to-End Generation.

Abstract:

As dialogue systems are expected to display more and more “human” features, a subﬁeld of conversational

artiﬁcial intelligence (AI) has emerged, aiming to make virtual agents socially competent. As the common

outlook ﬁrmly places emotion in the domain of chit-chatting, most of the studies tackle the problem of social

behaviour in open domain applications. In this paper, we provide an overview of such approaches and think

about how they can be applied to a task-oriented setting. The angle we explore is the inﬂuence of data on socio-

emotional neural generation models. We focus on three aspects: dialogue strategies, emotional strategies and

persona.

1 INTRODUCTION

Despite the development of increasingly advanced di-

alogue generation systems, one main challenge re-

mains for conversational agents: consistency. It im-

plies different sub-problems, such as logical consis-

tency (relevance of the generated answer to the con-

versation history), persona consistency (relevance of

the bot’s behaviour to its past behaviour in the inter-

action), social consistency (relevance of the answer to

a set of accepted social etiquette and rules) and emo-

tional consistency (relevance to the user’s emotional

state).

While the logical consistency issue can be solved

by the power of recent Natural Language Processing

(NLP) engines (Shu et al., 2021), no unique deﬁni-

tion or consensus has been reached in literature on the

other aspects. However, social components have been

proven to improve user experience: (Hardy et al.,

2021) show that different dialogue strategies used by a

conversational system such as back-channelling, self-

disclosure and the use of statements strongly inﬂu-

ence users’ behaviour, with a positive impact on user

engagement.

While several survey papers that study

emotionally-aware generation systems exist (Ma

et al., 2020; Wardhana et al., 2021; Pamungkas,

2019; Rashkin et al., 2018), they do not discuss

more general social behaviours achieved through

dialogue strategies or persona. This paper thus

constitutes the ﬁrst survey that takes into account

those aspects of social dialogue generation. We look

into the recent methods used in the open domain

to see how they can be applied to task-oriented

bots. To that end, we address the following questions:

Q1. What are the studied socio-emotional strategies

and how are they relevant to task-oriented conversa-

tional systems? (Section 2)

Q2. How is socio-conversational data currently col-

lected and annotated and what could be improved?

(Section 3)

Afterwards, we present the remaining challenges

for the ﬁeld of socio-emotional end-to-end generation

models (Section 4).

2 SOCIO-EMOTIONAL

STRATEGIES

In the literature of end-to-end generation models for

socio-emotional conversational systems, three dimen-

sions clearly emerge: emotional strategies, (i.e., em-

pathy), dialogue strategies and persona.

Emotional Strategies. In literature, one of the most

commonly studied emotional strategies is empathy.

Vanel, L., Yacoubi, A. and Clavel, C.

A Survey of Socio-Emotional Strategies for Generation-Based Conversational Agents.

DOI: 10.5220/0011632400003393

In Proceedings of the 15th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2023) - Volume 3, pages 185-192

ISBN: 978-989-758-623-1; ISSN: 2184-433X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

185

Various forms of empathy exist involving different

biological processes, but human psychology gener-

ally deﬁnes empathy as the capacity to understand

and feel what another person is experiencing. Be-

ing empathetic thus requires the ability to share the

other’s thoughts, to be aware of what they are going

through and to imagine what it would feel like to be

in that situation (Cuff et al., 2016). Multiple works in

the domain of Artiﬁcial Intelligence have been aim-

ing to reproduce this ability in machines (Fung et al.,

2018; Lin et al., 2020; Wang et al., 2021; Hosseini

and Caragea, 2021). In the case of NLP, it often trans-

lates into creating empathetic conversational systems.

However, as machines are not yet able to perfectly

understand deep, complex emotions or to infer the

causes behind them, it is hard to talk about empathy.

Hence why the term “Artiﬁcial Empathy” has been

coined, to refer to the coding of a “simpliﬁed” em-

pathy into AI and the different mechanisms that aim

to model and mimic human empathy. Empathy is a

term that catches people’s attention, especially in the

domain of conversational systems designed to interact

with humans.

Dialogue Strategies. While empathy certainly does

have its merits, as emotionally-aware approaches

have proven to signiﬁcantly improve user experience,

one issue rises from this “boom” of empathy: most

empathetic approaches seem to critically lack in any

real interactive strategy. Even though it is overly rep-

resented in literature, empathy is far from being the

sole option for efﬁcient social and emotionally-aware

interactions. Sometimes it is not even appropriate at

all. Strategies can go beyond the line drawn by em-

pathy/emotion, and step into the realm of “action”.

We call dialogue strategy a conversational plan of ac-

tion to engage and react in a dialogue (Galescu et al.,

2018; Santos Teixeira and Dragoni, 2022; Liu et al.,

2021). In the following table, you will ﬁnd a summary

of the different dialogue strategies we have seen in the

literature. Dialogue strategies as implemented in con-

versational systems represent the “realisation” of the

social and emotional behaviours at an utterance-level.

Some of them such as informing and questioning can

be linked to the concept of dialogue acts (see upper

section of Table 1).

Persona. While the dialogue generation will rely on

variable features, like the conversational behaviours

described previously, some “ﬁxed” features can also

be modelled, such as persona, or personality traits.

Many works (Wolf et al., 2019; Madotto et al., 2019;

Zheng et al., 2020; Wu et al., 2019) have started to

explore the idea of granting bots a personality with

Table 1: List of dialogue strategies.

Dialogue

Strategy

Deﬁnition

Agreeing Showing agreement with the interlocu-

tor (Rashkin et al., 2018; Welivita et al.,

2021)

Informing Giving relevant information (Liu et al.,

2021)

Questioning Asking a direct question (Rashkin et al.,

2018; Welivita et al., 2021; Liu et al.,

2021)

Back-

Channeling

Interjecting to signal attention (Hardy

et al., 2021)

Consoling Cheering someone up if they show neg-

ative sentiment (Rashkin et al., 2018;

Welivita et al., 2021)

Encouraging Showing enthusiasm and support

(Rashkin et al., 2018; Welivita et al.,

2021; Liu et al., 2021)

Providing

Suggestions

Giving advice on how to react to a sit-

uation (Rashkin et al., 2018; Welivita

et al., 2021; Liu et al., 2021)

Restating Reformulating the interlocutor’s point

(Liu et al., 2021)

Self-

Disclosing

Revealing information about oneself

(Liu et al., 2021; Soni et al., 2021;

Hardy et al., 2021)

Sympathising Reﬂecting the interlocutor’s feelings

(Rashkin et al., 2018; Welivita et al.,

2021; Liu et al., 2021)

likes and dislikes. For multi-modal systems, persona

can be conveyed through visual and auditory means,

but for chatbots it heavily relies on textual content.

Response generation will be regulated by the persona

features, so that the system will have its own way to

react to certain situations in a cohesive manner, using

conversational strategies adapted to the interaction’s

context.

2.1 How Are Socio-Emotional

Strategies Relevant to

Task-Oriented Dialogue Systems?

Papers are usually not very speciﬁc when it comes

to the strategies their models adopt. They use adjec-

tives such as “empathetic” or “emotionally-aware” to

refer to the ability to detect users’ emotions, but are

not necessarily very descriptive of the system’s socio-

emotional approach. Empathy is often used as an um-

brella term to designate superﬁcial behaviours where

the model mirrors the user’s emotion: the system will

detect an emotion in the user’s utterance using emo-

tion recognition components, then use it to produce

a response that matches the tone of the human inter-

locutor. If a mapping of the emotions is used (i.e., a

set of rules that decide what emotion the bot should

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

186

express in response to a detected user situation), it is

rarely detailed.

Simulating an emotion involves more than only

the textual content of the response: it is also dis-

played by the agent’s behaviour. Psychology stud-

ies have formed theories on how action and emotion

are linked in the brain’s behavioural processes (Frijda

et al., 1986). For humans, events that can trigger such

reactions are numerous. They can be external such as

during a social interaction or an interaction with ob-

jects, or internal which are provoked by oneself. They

usually start a process that involves physiological and

behavioural responses leading to one’s awareness of

their emotional state.

However, for a conversational system, the only

stimuli is the external event triggered by the user’s

situation. In addition, the agent does not actually

have feelings that can be inﬂuenced. When research

tries to grant the bot the ability to “feel”, it usually

relies on making them “say” the emotion instead of

“showing” it by merging the textual content with a

ﬁtting behavioural response. This is why we need to

give dialogue systems the ability to adopt more gen-

eral and lasting dialogue strategies, and to teach them

how and when to shift their behaviour to better re-

spond to users (Dias and Paiva, 2005; Mascarenhas

et al., 2021). Having the bot respond to an emotional

input with an emotional output can be sub-optimal,

and even intrusive. We need to give them the ability

to take certain types of conversational actions as de-

scribed in Table 1. Those actions must be consistent

with both the emotion the agent has to display and the

dialogue strategy it has been using during the previ-

ous turns of the dialogue.

In task-oriented applications, the challenge is to

be able to provide a response that is both efﬁcient

(i.e. meets the objective set by the task) and socially

and emotionally accurate with regard to the user’s sit-

uation. The goal of chit-chatting bots is the user’s

enjoyment and they naturally become the subject of

studies that aim to add emotion within generated dia-

logues. However, many do not consider how more ad-

vanced social behaviours could beneﬁt task-oriented

chatbots. Chit-chat remains relevant to a task-oriented

system (Sun et al., 2020) but its use implies many

challenges, such as the timing, safety or relevance to

the task.

Though some strategies listed in Table 1 are

purely emotional like sympathising or encouraging,

they can be applied to a task-oriented conversation

for more efﬁcient interaction. Some tasks require

the bot to show socio-emotionally-driven approaches.

In such cases, social (including emotions) and task

should not be considered independently.

We can take a concrete example of a virtual

agent tasked to help people calling a towing service.

This conversational agent needs to converse with cus-

tomers in distress or shock after a car accident. Hav-

ing a bot able to recognise such situations, act accord-

ingly and regulate the intensity of its emotional re-

sponse depending on the context (Ochs et al., 2008)

would be a great improvement to an inconsistent, sys-

tematic use of empathy - even if it is to transfer the

conversation to a human agent better equipped to deal

with the situation.

3 DATA

After establishing the importance of those strate-

gies, our next step is to understand how they are il-

lustrated in existing conversational corpora. Table

2 presents the different “socio-emotional” conversa-

tional datasets, along with a summary of some of their

features. Our goal is to study how to effectively in-

tegrate the different socio-emotional dimensions and

labels in conversational data.

3.1 Data Collection

Firstly, we go over the three main collection ap-

proaches we have observed in the literature of con-

versational systems.

3.1.1 Crowd-Sourcing

When applied to data collection, crowd-sourcing is a

participatory method where a group of people con-

tributes to creating data samples. The crowd-sourced

data is typically Human-Human interaction (H-H).

The data is collected by having two crowd-workers

interact by following set guidelines: the speaker (or

seeker) sets up a situation and the listener (or helper)

has to answer accordingly. Dialogue systems are

trained to perform in the role of Listeners. In (Rashkin

et al., 2018; Liu et al., 2021), workers taking the role

of speakers are asked to start the conversation follow-

ing an emotional prompt. The listeners have to adapt

their replies to the context presented by their inter-

locutor without being aware of the prompt.

3.1.2 Crawled from Online Sources

Another common way to collect data is by extract-

ing information from online sources. In the case of

textual data, it is often posts and comments crawled

from social media and is thus natural Human-Human

speech (Zhong et al., 2020; Mazar

e et al., 2018). It

A Survey of Socio-Emotional Strategies for Generation-Based Conversational Agents

187

can also come from other sources, such as OpenSub-

titles (Welivita et al., 2021) where the data is scripted.

The data extracted from those websites is usually not

labelled and annotation processes must be designed to

label the corpora.

3.1.3 Retrieved from Conversation Logs

Human-Agent. Most of the time, this approach ap-

plies to human-machine interactions, through conver-

sational models deployed for data-collection purposes

or as services. When such models are already de-

ployed, it is possible to retrieve the logs to compile

them into corpora. One such dataset is EmoContext

(Chatterjee et al., 2019), which can only be accessed

by directly requesting it from its authors.

Human-Human. This approach can also be used

to retrieve human-human conversation logs from call

center data, like in (Clavel et al., 2013).

Such data is less accessible, as this practice re-

quires having the means to deploy such services

or requesting data from a company with such re-

sources. Even then, the data is usually conﬁdential

and thus cannot be shared as public datasets, unless

user consent was given and/or data has been properly

anonymised.

3.2 Data Labelling

There are different approaches to data annotation:

they differ depending on the point of view of the anno-

tator, the tools and resources required to carry out the

labelling tasks, or the information available directly

from the source of the data.

Manual Annotations. This approach can be used

on any kind of data as long as the dataset is small or if

the research team has sufﬁcient material means. Data

can be entirely annotated by human experts or annota-

tors that have been trained on that speciﬁc annotation

task. (Li et al., 2017) introduce a 13K dataset, Dai-

lyDialog, that was annotated by 3 experts holding a

good understanding of dialogue and communication

theory, and were taught the guidelines of the particu-

lar task (i.e. emotion and dialogue act annotation).

Semi-Automatic Annotations. Manual annotation

can be paired up with algorithms to accelerate the

work and lighten the human judges’ workload. This

hybrid approach is called semi-automatic annotation

(Lu et al., 2021; Welivita et al., 2021). In general,

the ﬁrst step is to have human judges annotate a small

fraction of the collected dialogues. To provide further

support to the human helpers, (Welivita et al., 2021)

train a Bert-based model on the empathetic dialogues

dataset (Rashkin et al., 2018) to output the top 3 most

likely label for each dialogue. Instead of considering

all 42 available labels, the human judges can pick be-

tween 3, with the possibility of selecting one of the

others if needed. The second part of the process is to

use this manually annotated set as training data for a

classiﬁer that will automatically annotate the rest of

the collected data.

This can also be used for data retrieved from con-

versation logs. For their EmoContext dataset, (Chat-

terjee et al., 2019) have 50 human annotators man-

ually annotate 300 dialogues for each of the 4 emo-

tional classes. Each dialogue is looked over by 7

judges. These annotated dialogues were embedded as

vectors, and used with cosine similarity thresholds to

ﬁnd similar occurrences in the non-annotated pool of

data. Results would then be checked by human judges

who make the last ruling.

Derived Annotations. It is the main method of an-

notation for crowd-sourced data, where the emotions

and dialogue strategies labels associated with the data

can be directly derived from the instructions given

to the annotators in (Rashkin et al., 2018; Liu et al.,

2021). Additionally, (Liu et al., 2021) collect answers

to surveys submitted to the workers during the collec-

tion process on both the listener and the speaker side.

This allows for the collection of more data such as

empathy grading and utterance-level dialogue strate-

gies.

(Zhong et al., 2020) use the context in which web

data was posted and extracts posts and comments on

two subreddits: happy and offmychest. The original

Reddit environment thus provides a label and what is

left to do is to direct a quality check by asking hu-

man annotators to annotate a small set of the con-

versations: 100 from the happy reddit, 100 from the

offmychest reddit and for control, 100 from the casu-

alconversations reddit.

For conversation logs, derived annotation can be

user feedback. This is because some deployed bots

ask for customer satisfaction feedback either directly

or through surveys. This information can be used

to annotate certain conversations (Maslowski et al.,

2017).

In the case of derived annotation, the label comes

directly from the source of the data. The point of view

of the annotator is internal, which differs greatly from

the external viewpoint of the annotators in the other

methods described above, where they have to analyse

data that comes from an outside source. This differ-

ence is even more prominent when it comes to labels

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

188

Table 2: List of Conversational Emotional Corpora in Literature. Volume refers to the number of conversations. The Dimen-

sion Labels column presents the different dimension presented by the datasets: ES for Emotional Strategy, DS for Dialogue

Strategies and P for Persona;

refers to utterance-level annotation, and

refers to conversation-level annotation; the fol-

lowing number refers to the number of labels in the category. For collection, CS refers to Crowd-Sourced. N stands for

“Natural”, where S stands for “Scripted”.

Name Interaction Volume Dimension Labels Collection Scripted-

Natural

Annotation

Emotional Support Conversation Dataset (ES-

Conv), (Liu et al., 2021)

H-H 1300 ES

(7), DS

(8) CS S Emotion Prompt

Empathetic dialogues, (Rashkin et al., 2018) H-H 25k ES

(32), DS

(8) CS S Emotion Prompt

EmoWOZ, (Feng et al., 2021) H-H 11K ES

(7) CS S Manual annotation (Amazon Mechanical Turk)

Persona Chat (PC), (Zhang et al., 2018) H-H 10K P(1155) CS S No Annotation

EmotionLines, (Chen et al., 2018) H-H 2K ES

(7) TV Script S Manual annotation with 5 Human Annotators

DailyDialog, (Li et al., 2017) H-H 13K ES

(6), DS

(4) Internet S Manual Annotation by 3 Human Annotators

Emotional dialogue in OpenSubtitles (EDOS),

(Welivita et al., 2021)

H-H 1M ES

(32), DS

(8) OpenSubtitles S Semi-Automatic annotation with 3 Human Anno-

tators

Persona-based Empathetic Conversation

(PEC), (Zhong et al., 2020)

H-H 11k ES

(2), P(100) Reddit N Automatic annotation with Human Control Check

PCR, (Mazar

e et al., 2018) H-H 5M P(7.3M) Reddit N Semi-Automatic annotation with Human Annota-

tors

Sentimental Douban Conversational Corpus,

(Lu et al., 2021)

H-H 1M ES

(Sentiment) Douban N Semi-Automatic annotation with 3 Human Anno-

tators

EmoContext, (Chatterjee et al., 2019) H-M 38K ES

(4) Bot Logs N Semi-Automatic annotation with 50 Human An-

notators

as subjective as emotion. The internal annotator can

label accurately what emotion they are feeling and ex-

pressing in the data, but an external annotator has to

make a guess as to what emotion is being expressed.

This can be an easy guess, or be quite ambiguous and

perceived differently depending on one’s personal ex-

periences and sensibilities.

3.3 Annotation Dimensions

We now present three “dimensions” (Column 4 in Ta-

ble 2) that are annotated in data: emotional labels,

dialogue strategies and persona features. Another as-

pect we comment on is the annotation level: labels

can either be given on a conversation-level (one label

for the entire dialogue) or utterance-level (one label

per speaker turns).

Emotional States. As we have mentioned in the

ﬁrst section on dialogue strategies, empathy and emo-

tions are the most represented socio-emotional strat-

egy. As a direct consequence of this, most social cor-

pora are annotated with emotional states.

(Lu et al., 2021) use sentiment instead of emotion.

Sentiment analysis is an NLP task that consists of un-

derstanding the polarity of an utterance and classify-

ing it into the positive sentiment or the negative sen-

timent. Sometimes there is also the neutral class, but

(Lu et al., 2021) only use binary labels. (Zhong et al.,

2020) do something similar to sentiment labelling, us-

ing the reddit context the data was extracted from to

annotate whole conversations: happy for a positive

polarity, offmychest for a negative polarity and casu-

alConversation for a neutral control sample.

Other studies have referred to various psychology

theories for emotion classiﬁcation, but there is no con-

sensus on the way to deﬁne and classify emotions

in conversation analysis (Clavel and Callejas, 2016).

(Chatterjee et al., 2019) use a simpliﬁed classiﬁca-

tion of emotions, angry, happy, sad and others. (Li

et al., 2017) base their annotations on Ekman’s model

(Ekman, 1999). On the other hand, (Liu et al., 2021;

Rashkin et al., 2018; Welivita et al., 2021) use clas-

sical theories derived from biological responses (Ek-

man, 1999; Plutchik, 1984) as well as studies that

focused on larger sets of subtle, context-dependant

emotions (Skerry et al., 2015) reaching as many as

32 emotion labels.

(Feng et al., 2021) use a classiﬁcation tailored to

task-oriented settings: neutral, fearful, dissatisﬁed,

apologetic, abusive, excited and satisﬁed.

Dialogue Strategies. Another dimension of anno-

tation is dialogue strategies, as some papers have led

a dual annotation, taking into account both emotions

and dialogue strategies. Table 1 presented in the ﬁrst

section summarises the ones we have seen being used

in literature. As seen in Section 2, dialogue strate-

A Survey of Socio-Emotional Strategies for Generation-Based Conversational Agents

189

gies can be associated with dialogue acts and a large

number of existing corpora are annotated in dialogue

acts. For example, (Li et al., 2017) use inform, ques-

tion, directive, commissive as labels to develop socio-

emotional conversational systems. Such strategies are

included in the works of (Bunt, 2006) on the classiﬁ-

cation and the dimensions of dialogue acts.

Persona. The last aspect we want to introduce in

this section is persona features. Corpora that in-

troduce persona characteristics (Zhong et al., 2020;

Mazar

e et al., 2018) usually deﬁne a “persona” as a

set of sentences representing a given personality that

inﬂuences the agent’s wording and general behaviour.

Persona sentences are often collected from online user

proﬁles found in conversational data, with one user

representing one persona. It may be independent from

the conversation that can be annotated with other la-

bels such as emotion.

3.4 Guidelines for the Collection and

Annotation of Socio-Emotional Data

For a model to provide good performance, the train-

ing data, as well as the label types, must be similar to

the real-time data the system will face when deployed

to users. Public corpora often compile conversations

from the open domain that exhibit different strategies,

formulations and goals. For good task-oriented re-

sults, it is a challenge to gather data that is the closest

possible to the task’s conversational context.

For the means of collection, crowdsourcing allows

for precise control on the content of the dialogues,

the gathering of numerous and diverse samples that

answer speciﬁc problems as deﬁned by the research

team, and a label given directly by the source of the

data (the speaker). Nevertheless, the scripted na-

ture of the interactions induces a signiﬁcant bias in

the corpus, as it differs from how one might spon-

taneously express themselves in a similar situation.

Open-domain data crawled from online sources is an-

other issue, as it is not structured in the same way

task-oriented dialogues are. Ultimately, the optimal

solution for service providers that already have agents

deployed in production is to retrieve and use con-

versation logs. Data retrieved from deployed con-

versational agents falls squarely into the Human-

Machine interaction scope. While the acceptability

of such bots can inﬂuence the conversation and the

way the user conveys their intent, the interaction re-

mains spontaneous and authentic. However, biases

that are intrinsic to the machine’s behaviour and de-

velopment will be preserved in the corpus. Even so, it

offers the best compromise between task-oriented ut-

terances that match the intended use of the bot and the

spontaneous nature of the conversations and the gen-

uine reactions, even if they might be inﬂuenced by

their acceptability of technology and AI in customer

service.

Once the data has been collected, we believe that

a very important step must precede the annotation

phase. A labelling process must be designed that es-

tablishes what strategies and emotions need to be an-

notated. This can be done by studying patterns in the

data and extracting the features that will be taught to

the agent. Once this process is clearly deﬁned, a semi-

automatic annotation allows a good optimisation of

resources, while guaranteeing good consistency with

quality control by human annotators. It is important

to note that annotating emotional states is a complex

task, even for humans, because of how subjective it is.

Two people might not agree on an emotion expressed

in a same sample, because of the personal and internal

biases that colour our perception of others’ emotional

states.

If we consider our towing service example again:

how can we build a socio-emotional dataset relevant

to the response generation task? The towing company

provided us with conversation ﬁles between their hu-

man agents and their customers. For the collection

of relevant annotations in dialogue strategies, we look

at the different sets of socio-emotional labels used in

literature (see Section 3.3). By observing conversa-

tions from the company’s logs, we are able to narrow

down a list of relevant strategies and emotion labels.

We noticed that some might not be as common in the

context of our task, like ‘disgust’ for emotions. Fur-

thermore, some new labels emerged and were added

to our list, such as ‘Repair/Self-Repair’ for the act of

correcting oneself or one’s interlocutor.

4 REMAINING CHALLENGES

The growing availability of efﬁcient and pre-trained

language models (Devlin et al., 2019; Radford et al.,

2019) has spurred the ﬁeld of end-to-end NLP mod-

els. These data-driven approaches applied to socio-

emotional dialogue have raised many challenges.

The ﬁrst one is related to the architecture: the ab-

sence of causality modelling creates a lack of con-

trol and explainability of the process (Yacoubi and

Sabouret, 2018). Nonetheless, some approaches have

attempted to bridge this issue, offering better visi-

bility on the problem decomposition while making

the most of the computational beneﬁts of end-to-end

training (Gao et al., 2021; Soni et al., 2021).

Secondly, these approaches also need a huge

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

190

amount of annotated data, which implies speciﬁc data

collection and annotation processes.

Finally, papers tend to contrast automatic metrics

designed to evaluate text generation systems (Pap-

ineni et al., 2002; Lin, 2004) to human evaluation

(Finch and Choi, 2020). Empathetic and social be-

haviour is better appraised by humans who are both

the target-users and have the social intelligence that

AI systems try to replicate.

5 CONCLUSION

This survey paper provides an overview of the re-

cent works that tackle the challenging goal of adding

a socio-emotional dimension to end-to-end dialogue

generation. As most of those studies focus on open-

domain applications, we think about how to apply

these methods to a task-oriented context. In order to

create conversational systems with efﬁcient social be-

haviour, it is important to combine dialogue strategies

and emotions, as well as eventually adding persona

features for better consistency in the responses.

Our future work involves designing a general, ef-

ﬁcient and comprehensive labelling process based on

the observations made in this paper. This also in-

cludes verifying the annotation process and the qual-

ity of the resulting labelling.

REFERENCES

Bunt, H. (2006). Dimensions in dialogue act annotation.

In Proceedings of the Fifth International Conference

on Language Resources and Evaluation (LREC’06),

Genoa, Italy. European Language Resources Associa-

tion (ELRA).

Chatterjee, A., Narahari, K. N., Joshi, M., and Agrawal, P.

(2019). SemEval-2019 task 3: EmoContext contex-

tual emotion detection in text. In Proceedings of the

13th International Workshop on Semantic Evaluation,

pages 39–48, Minneapolis, Minnesota, USA. Associ-

ation for Computational Linguistics.

Chen, S.-Y., Hsu, C.-C., Kuo, C.-C., Ting-Hao, Huang, and

Ku, L.-W. (2018). Emotionlines: An emotion corpus

of multi-party conversations.

Clavel, C., Adda, G., Cailliau, F., Garnier-Rizet, M., Cavet,

A., Chapuis, G., Courcinous, S., Danesi, C., Daquo,

A.-L., Deldossi, M., Guillemin-Lanne, S., Seizou, M.,

and Suignard, P. (2013). Spontaneous speech and

opinion detection: mining call-centre transcripts. Lan-

guage Resources and Evaluation, 47(4):1089–1125.

Clavel, C. and Callejas, Z. (2016). Sentiment analy-

sis: From opinion mining to human-agent interaction.

IEEE Transactions on Affective Computing, 7(1):74–

93.

Cuff, B., Brown, S., Taylor, L., and Howat, D. (2016). Em-

pathy: A review of the concept. Emotion Review,

8:144–153.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.

(2019). BERT: Pre-training of deep bidirectional

transformers for language understanding. In Pro-

ceedings of the 2019 Conference of the North Amer-

ican Chapter of the Association for Computational

Linguistics: Human Language Technologies, Volume

1 (Long and Short Papers), pages 4171–4186, Min-

neapolis, Minnesota. Association for Computational

Linguistics.

Dias, J. and Paiva, A. (2005). Feeling and reasoning: A

computational model for emotional characters. In

Portuguese conference on artiﬁcial intelligence, pages

127–140. Springer.

Ekman, P. (1999). Basic emotions. Handbook of cognition

and emotion, 98(45-60):16.

Feng, S., Lubis, N., Geishauser, C., Lin, H.-c., Heck, M.,

van Niekerk, C., and Ga

c, M. (2021). Emowoz: A

large-scale corpus and labelling scheme for emotion

recognition in task-oriented dialogue systems.

Finch, S. E. and Choi, J. D. (2020). Towards uniﬁed di-

alogue system evaluation: A comprehensive analy-

sis of current evaluation protocols. arXiv preprint

arXiv:2006.06110.

Frijda, N., FRIDJA, N., Manstead, A., and Oatley, K.

(1986). The Emotions. Studies in Emotion and So-

cial Interaction. Cambridge University Press.

Fung, P., Bertero, D., Xu, P., Park, J. H., Wu, C.-S., and

Madotto, A. (2018). Empathetic dialog systems. In

The international conference on language resources

and evaluation. European Language Resources Asso-

ciation.

Galescu, L., Teng, C. M., Allen, J., and Perera, I. (2018).

Cogent: A generic dialogue system shell based on a

collaborative problem solving model. In Proceedings

of the 19th Annual SIGdial Meeting on Discourse and

Dialogue, pages 400–409.

Gao, J., Liu, Y., Deng, H., Wang, W., Cao, Y., Du, J., and

Xu, R. (2021). Improving empathetic response gener-

ation by recognizing emotion cause in conversations.

In Findings of the Association for Computational Lin-

guistics: EMNLP 2021, pages 807–819.

Hardy, A., Paranjape, A., and Manning, C. D. (2021). Ef-

fective social chatbot strategies for increasing user ini-

tiative. In Proceedings of the 22nd Annual Meeting

of the Special Interest Group on Discourse and Dia-

logue, pages 99–110.

Hosseini, M. and Caragea, C. (2021). It takes two to em-

pathize: One to seek and one to provide. In Proceed-

ings of the AAAI Conference on Artiﬁcial Intelligence.

To appear.

Li, Y., Su, H., Shen, X., Li, W., Cao, Z., and Niu, S. (2017).

Dailydialog: A manually labelled multi-turn dialogue

dataset.

Lin, C.-Y. (2004). ROUGE: A package for automatic evalu-

ation of summaries. In Text Summarization Branches

Out, pages 74–81, Barcelona, Spain. Association for

Computational Linguistics.

A Survey of Socio-Emotional Strategies for Generation-Based Conversational Agents

191

Lin, Z., Xu, P., Winata, G. I., Siddique, F. B., Liu, Z., Shin,

J., and Fung, P. (2020). Caire: An end-to-end empa-

thetic chatbot. In Proceedings of the AAAI Conference

on Artiﬁcial Intelligence, volume 34, pages 13622–

13623.

Liu, S., Zheng, C., Demasi, O., Sabour, S., Li, Y., Yu, Z.,

Jiang, Y., and Huang, M. (2021). Towards emotional

support dialog systems. ArXiv, abs/2106.01144.

Lu, X., Tian, Y., Zhao, Y., and Qin, B. (2021). Re-

trieve, discriminate and rewrite: A simple and ef-

fective framework for obtaining affective response in

retrieval-based chatbots. In Findings of the Associ-

ation for Computational Linguistics: EMNLP 2021,

pages 1956–1969.

Ma, Y., Nguyen, K. L., Xing, F. Z., and Cambria, E. (2020).

A survey on empathetic dialogue systems. Informa-

tion Fusion, 64:50–70.

Madotto, A., Lin, Z., Wu, C.-S., and Fung, P. (2019). Per-

sonalizing dialogue agents via meta-learning. In Pro-

ceedings of the 57th Annual Meeting of the Associa-

tion for Computational Linguistics, pages 5454–5459.

Mascarenhas, S., Guimar

aes, M., Santos, P. A., Dias, J.,

Prada, R., and Paiva, A. (2021). Fatima toolkit–toward

an effective and accessible tool for the development

of intelligent virtual agents and social robots. arXiv

preprint arXiv:2103.03020.

Maslowski, I., Lagarde, D., and Clavel, C. (2017). In-the-

wild chatbot corpus: from opinion analysis to interac-

tion problem detection. In ICNLSSP 2017, pages 115–

120, Casablanca, Morocco. ISGA, Institut Sup

erieur

d’inG

enierie et des Affaires.

Mazar

e, P.-E., Humeau, S., Raison, M., and Bordes, A.

(2018). Training millions of personalized dialogue

agents. In Proceedings of the 2018 Conference on

Empirical Methods in Natural Language Processing,

pages 2775–2779, Brussels, Belgium. Association for

Computational Linguistics.

Ochs, M., Sabouret, N., and Corruble, V. (2008). Model-

ing the dynamics of non-player characters’ social re-

lations in video game. In Proceedings of the AAAI

Conference on Artiﬁcial Intelligence and Interactive

Digital Entertainment, volume 4, pages 90–95.

Pamungkas, E. W. (2019). Emotionally-aware chatbots: A

survey. arXiv preprint arXiv:1906.09774.

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002).

Bleu: a method for automatic evaluation of machine

translation. In Proceedings of the 40th Annual Meet-

ing of the Association for Computational Linguistics,

pages 311–318, Philadelphia, Pennsylvania, USA.

Association for Computational Linguistics.

Plutchik, R. (1984). Emotions : a general psychoevolution-

ary theory.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and

Sutskever, I. (2019). Language models are unsuper-

vised multitask learners.

Rashkin, H., Smith, E. M., Li, M., and Boureau, Y.-L.

(2018). Towards empathetic open-domain conversa-

tion models: A new benchmark and dataset. arXiv

preprint arXiv:1811.00207.

Santos Teixeira, M. and Dragoni, M. (2022). A review

of plan-based approaches for dialogue management.

Cognitive Computation, pages 1–20.

Shu, C., Zhang, Y., Dong, X., Shi, P., Yu, T., and Zhang,

R. (2021). Logic-consistency text generation from

semantic parses. In Findings of the Association

for Computational Linguistics: ACL-IJCNLP 2021,

pages 4414–4426, Online. Association for Computa-

tional Linguistics.

Skerry, A. E., Saxe, R., Skerry, A. E., and Saxe, R.

(2015). Neural representations of emotion are orga-

nized around abstract event features. Curr. Biol.

Soni, M., Cowan, B., and Wade, V. (2021). Enhancing

self-disclosure in neural dialog models by candidate

re-ranking. arXiv preprint arXiv:2109.05090.

Sun, K., Moon, S., Crook, P., Roller, S., Silvert, B., Liu,

B., Wang, Z., Liu, H., Cho, E., and Cardie, C. (2020).

Adding chit-chat to enhance task-oriented dialogues.

Wang, Y.-H., Hsu, J.-H., Wu, C.-H., and Yang, T.-H.

(2021). Transformer-based empathetic response gen-

eration using dialogue situation and advanced-level

deﬁnition of empathy. In 2021 12th International

Symposium on Chinese Spoken Language Processing

(ISCSLP), pages 1–5.

Wardhana, A. K., Ferdiana, R., and Hidayah, I. (2021).

Empathetic chatbot enhancement and development: A

literature review. In 2021 International Conference

on Artiﬁcial Intelligence and Mechatronics Systems

(AIMS), pages 1–6.

Welivita, A., Xie, Y., and Pu, P. (2021). A large-scale

dataset for empathetic response generation. In Pro-

ceedings of the 2021 Conference on Empirical Meth-

ods in Natural Language Processing, pages 1251–

1264.

Wolf, T., Sanh, V., Chaumond, J., and Delangue, C. (2019).

Transfertransfo: A transfer learning approach for

neural network based conversational agents. arXiv

preprint arXiv:1901.08149.

Wu, B., Li, M., Wang, Z., Chen, Y., Wong, D., Feng, Q.,

Huang, J., and Wang, B. (2019). Guiding variational

response generator to exploit persona.

Yacoubi, A. and Sabouret, N. (2018). Teatime: A formal

model of action tendencies in conversational agents.

In ICAART (2), pages 143–153.

Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D.,

and Weston, J. (2018). Personalizing dialogue agents:

I have a dog, do you have pets too? In Proceed-

ings of the 56th Annual Meeting of the Association for

Computational Linguistics (Volume 1: Long Papers),

pages 2204–2213, Melbourne, Australia. Association

for Computational Linguistics.

Zheng, Y., Zhang, R., Huang, M., and Mao, X. (2020).

A pre-training based personalized dialogue generation

model with persona-sparse data. In Proceedings of

the AAAI Conferencezh on Artiﬁcial Intelligence, vol-

ume 34, pages 9693–9700.

Zhong, P., Zhang, C., Wang, H., Liu, Y., and Miao, C.

(2020). Towards persona-based empathetic conversa-

tional models. arXiv preprint arXiv:2004.12316.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

192