Emotion-Cause Pair Extraction as Question Answering

Huu-Hiep Nguyen

and Minh-Tien Nguyen

Cinnamon AI, Vietnam

Faculty of Information Technology, Hung Yen University of Technology and Education, Vietnam

Keywords:

Sentiment Analysis, Emotion-Cause Pair Extraction, Question Answering, BERT.

Abstract:

The task of Emotion-Cause Pair Extraction (ECPE) aims to extract all potential emotion-cause pairs of a

document without any annotation of emotion or cause clauses. Previous approaches on ECPE have tried to

improve conventional two-step processing schemes by using complex architectures for modeling emotion-

cause interaction. In this paper, we cast the ECPE task to the question answering (QA) problem and propose

simple yet effective BERT-based solutions to tackle it. Given a document, our Guided-QA model ﬁrst predicts

the best emotion clause using a ﬁxed question. Then the predicted emotion is used as a question to predict the

most potential cause for the emotion. We evaluate our model on a standard ECPE corpus. The experimental

results show that despite its simplicity, our Guided-QA achieves promising results and is easy to reproduce.

The code of Guided-QA is also provided.

1 INTRODUCTION

Emotion Cause Extraction (ECE) is the task of detect-

ing the cause behind an emotion given the emotion an-

notation (Lee et al., 2010; Gui et al., 2016), see Figure

1 (Top). The text was divided into clauses and the task

was to detect the clause containing the cause, given

the clause containing the emotion. However, the ap-

plicability of ECE is limited due to the fact that emo-

tion annotations are required at test time. Recently,

(Xia and Ding, 2019) introduced the more challeng-

ing Emotion-Cause Pair Extraction (ECPE) task: ex-

tracting all possible emotion-cause clause pairs in a

document without annotations. Figure 1 (Bottom)

shows an example of the ECPE task. The input is a

document of six clauses. Clauses c4 and c5 contain

emotion with the emotion expressions “happy” and

”worried”. The emotion c4 has two causes c3 and c2,

the emotion c5 has one cause c6, so the expected out-

put is {(c4,c2), (c4,c3), (c5,c6)}.

Why cause-effect pair extraction? We argue that

independent extraction of cause and emotion may be

ineffective. For a given document, ECPE models

may predict correct cause but incorrect emotion. This

makes the output incomplete, and subsequent process-

ing steps less reliable (Ding et al., 2020; Wei et al.,

2020; Chen et al., 2020; Yan et al., 2021). We make

a toy example of two models using the document in

Corresponding author.

c1: Yesterday morning

c4: The old man was very happy (Emotion)

c6: as he doesn’t know how to keep so much money. (Cause)

c2: a policeman visited the old man with the lost money (Cause)

c3: and told him that the thief was caught. (Cause)

Emotion-Cause Pair Extraction (ECPE)

Input: a document

Output: emotion-cause pairs

{(c4,c2), (c4, c3),(c5,c6)}

c5: but he still feels worried, (Emotion)

c1: Yesterday morning

c4: The old man was very happy

c6: as he doesn’t know how to keep so much money. (Cause)

c2: a policeman visited the old man with the lost money

c3: and told him that the thief was caught.

Emotion-Cause Extraction (ECE)

Input: a document and annotation of emotions c5

Output: cause clause(s)

{c6}

c5: but he still feels worried, (Emotion)

Figure 1: Illustration of ECE and ECPE tasks.

Figure 1. Model-1 predicts (c4,c1) and (c6,c3) as

emotion-cause pairs. Its emotion, cause and pair ac-

988

Nguyen, H. and Nguyen, M.

Emotion-Cause Pair Extraction as Question Answering.

DOI: 10.5220/0011883100003393

In Proceedings of the 15th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2023) - Volume 3, pages 988-995

ISBN: 978-989-758-623-1; ISSN: 2184-433X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

curacy scores are 0.5, 0.33 and 0.0. Model-2 predicts

(c4, c2) and (c6, c1) as emotion-cause pairs. Its emo-

tion, cause and pair accuracy scores are 0.5, 0.33 and

0.33. From the perspective of the pair extraction task,

Model-2 is better.

Previous studies addressed the ECPE task by us-

ing sequence labeling (Lee et al., 2010; Cheng et al.,

2021), clause-level classiﬁcation (Gui et al., 2016;

Ding et al., 2020; Chen et al., 2020), ranking (Wei

et al., 2020), or recurrent synchronization (Chen et al.,

2022). The methods achieved promising results, yet

the use of interaction between emotion and cause

clauses is still an open question. For example, c4 and

c2 share ”the old man” tokens, which refer to ”him”

in c3; and c5 and c6 share ”he”, which mentions ”the

old man” in c2 and c4.

Based on this observation, we introduce a

paradigm shift (Sun et al., 2022) for ECPE by us-

ing span extraction. As far as we know, (Gui et al.,

2017) is the ﬁrst work that uses question answering

for emotion-cause detection. However, their work ad-

dresses the ECE task only, which requires the annota-

tion of emotion for cause prediction. In contrast, our

paradigm shift is applied to the ECPE task, which is

more challenging and does not require the annotation

of emotion for cause prediction. The paradigm bases

on two hypotheses. First, information from emotion

clauses can be used to infer cause clauses. Second,

emotion and cause clauses share implicit interaction.

The design of our model is based on these two hy-

potheses. For the ﬁrst hypothesis, we form questions

based on emotional information which is used to pre-

dict emotion clauses. For the second hypothesis, we

used predicted emotion as the guided question for

cause prediction. The model is trained by using the

BERT-QA architecture (Devlin et al., 2018) in form

of SQuAD task (Rajpurkar et al., 2016).

Our paper makes three main contributions.

• We formulate the ECPE task as a QA problem and

propose a Guided-QA model to implicitly cap-

ture the relationship between emotion and cause

clauses, in which the predicted emotion is used as

a guided question for cause prediction. The model

can capture the implicit interaction between emo-

tions and causes with a simple but effective archi-

tecture. To the best of our knowledge, we are the

ﬁrst to address the ECPE task by using QA for-

mulation.

• We evaluate our model on the standard ECPE cor-

pus (Xia and Ding, 2019; Fan et al., 2020). Exper-

imental results show that our approach achieves

promising results compared to previous methods.

• We promote the reproducibility (Houghton et al.,

2020) by providing the source code of our meth-

ods as well as rerunning publicly available source

codes of the compared methods.

2 RELATED WORK

ECE and ECPE Tasks. The ECE task was formu-

lated as sequence-labeling by (Lee et al., 2010) and

reﬁned as clause-level by (Gui et al., 2016). Recently,

the more challenging ECPE task (Xia and Ding, 2019)

has attracted a lot of contributions with several strong

methods (Ding et al., 2020; Wei et al., 2020; Chen

et al., 2020; Cheng et al., 2021; Chen et al., 2022). For

example, (Ding et al., 2020) introduced ECPE-MLL,

which uses a sliding window for a multi-label learning

scheme. ECPE-MLL extracts the emotion and cause

by using the iterative synchronized multitask learn-

ing. (Chen et al., 2022) proposed a similar approach,

recurrent synchronization network (RSN), that ex-

plicitly models the interaction among different tasks.

(Wei et al., 2020) presented RankCP, a transition-

based framework, by transforming the ECPE problem

into directed graph construction, from which emo-

tions and the corresponding causes can be extracted

simultaneously based on labeled edges. The PairGCN

model (Chen et al., 2020) used Graph Convolutional

Networks to model three types of dependency rela-

tions among local neighborhood candidate pairs and

facilitate the extraction of pair-level contextual infor-

mation.

We share the purpose of addressing the ECE and

ECPE tasks with prior studies, however, instead of us-

ing classiﬁcation or sequence labeling, we address the

tasks with a new paradigm shift by using span extrac-

tion. It allows us to take into account the implicit in-

teraction between emotion and cause clauses and to

design a simple but effective BERT-based model for

ECE and ECPE.

(Bi and Liu, 2020) derived a span-based dataset

and formulated a new ECSP (Emotion Cause Span

Prediction) task from (Xia and Ding, 2019) but it has

not attracted much attention. The accessibility of the

dataset and source code may be the reason. We leave

span-based ECSP evaluation as future work.

Paradigm Shift in Natural Language Processing.

A paradigm is a general modeling framework or a

family of methods to solve a class of tasks. For in-

stance, sequence labeling is a mainstream paradigm

for Part-of-speech (POS) tagging and Named en-

tity recognition (NER). The sequence-to-sequence

(Seq2Seq) paradigm is a popular tool for summariza-

tion and machine translation. Different paradigms

usually require different formats of input and output,

Emotion-Cause Pair Extraction as Question Answering

989

BERT

CLS

c_1

q_1

SEP

q_n

c_m

...

Start/End Span

T_CLS

T_c_1

T_q_1

T_SEP

T_q_n

T_c_m

...

E_CLS

E_c_1

E_q_1

E_SEP

E_q_n

E_c_m

...

Figure 2: BERT-based extractive Question Answering.

and therefore highly depend on the annotation of the

tasks.

Paradigm shift indicates the job of solving one

NLP task in a new paradigm by reformulating the

task along with changing the input-output formats.

Paradigm shift in NLP has been explored scatterringly

in recent years and with the advent of pretrained lan-

guage models, it became a rising trend (Li et al.,

2019; Khashabi et al., 2020). An excellent survey of

paradigm shifts in NLP has been done by (Sun et al.,

2022). In this work, we realize such a paradigm shift

for the ECPE task, i.e., we reformulate the clause-

based text classiﬁcation task as span extraction.

Span-Based Extractive Question Answering. Our

formulation for the tasks of ECE and ECPE relates

to span-based extractive QA, which has been widely

investigated (Khashabi et al., 2020). More precisely,

we design our model based on the pretrained language

models (PLMs) such as BERT (Devlin et al., 2018) or

RoBERTa (Liu et al., 2019). This is because applying

PLMs as the backbone of QA systems has become a

standard procedure. For detailed information, please

refer to (Devlin et al., 2018).

Figure 2 reproduced from (Devlin et al., 2018)

shows how BERT is applied to the extractive QA

task. Tokens of question q = q

, .., q

and context

C = c

, .., c

are concatenated before being encoded

by BERT. The contextual representations of tokens T

are put into a feed-forward layer followed by a soft-

max. Each candidate span for the answer is scored as

the product of start/end probabilities. The maximum

scoring span is used as the prediction. The training

objective is the loglikelihood of the correct start and

end positions.

By casting the ECPE to QA problem, our work

leverages the powerful models of the BERT family

(Devlin et al., 2018) to detect clause-level emotions

and causes as well as emotion-cause pairs.

3 METHOD

3.1 Problem Statement

Given a document of n clauses d = (c

, c

, .., c

), the

goal of ECPE is to detect all potential emotion-cause

pairs P = {..(c

, c

), ..} where c

is an emotion clause,

and c

is the corresponding cause clause (Xia and

Ding, 2019). We formulated the ECPE task as a QA

problem. Given a set of questions {q

, q

} (q

is for

emotion and q

is for cause) and a context document

d with n clauses, the model learns to predict start and

end positions of each c

and c

: s

, e

= f (d, q

|Θ)

and s

, e

= f (d, q

|Θ) to form P. Θ can be learnt

by using independent or guided extraction.

3.2 Independent Emotion, Cause

Extraction

We ﬁrst introduce a simple version of our model,

Indep-QA in Figure 3. Indep-QA receives a ﬁxed

question (for emotion or cause) and then pulls out cor-

responding emotion or cause clauses independently.

Question Formulation. Because no emotion/cause

information is provided beforehand, we have to detect

them ﬁrst with generic questions. It is possible to use

pre-deﬁned questions for extraction (Mengge et al.,

2020), however, we argue that the deﬁnition of ques-

tions is time-consuming, needs domain knowledge,

and does not guarantee the semantic relationship be-

tween the questions and context documents. Instead,

we use two short questions ”emotion” and ”cause”

as an implicit indicator that provides additional infor-

mation for the model. We leave the analysis of using

generic questions such as ”What is the emotion?” and

”What is the cause?” as future work.

Learning and Prediction. Given a document d and

a question (”emotion” or ”cause”), we concatenated

all clauses of d and the question to form a single se-

quence C. The sequence was fed to a pretrained lan-

guage model (PLM) to obtain its hidden representa-

tions of tokens which were subsequently fed into a

feed-forward layer followed by a softmax layer. Each

candidate span was scored as the product of start/end

probabilities. The maximum scoring span was used

as the prediction.

Mapping Predicted Answer Span to Clauses. The

predicted answer span may overlap with one or sev-

eral clauses. We applied a span-to-clause mapping

rule to determine which clauses are predicted results:

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

990

answer start

answer end

question = "emotion"

context =

predicted Emotion clause = c2

answer start

answer end

question = "cause"

context =

predicted Cause clause = c4

predicted EC-pair = (c2,c4)

Figure 3: Independent extraction Indep-QA.

question =

context =

predicted Cause clause = c3

context =

predicted Emotion clause = c4

predicted EC-pair = (c4,c3)

question =

answer start

answer end

answer start

answer end

question =

context =

predicted Emotion clause = c2

context =

predicted Cause clause = c3

predicted EC-pair = (c2,c3)

question =

answer start

answer end

answer start

answer end

"emotion"

"cause"

Figure 4: Guided pair extraction Guided-QA: Emotion is detected ﬁrst (Left), Cause is detected ﬁrst (Right).

the clause that overlaps most with the predicted span

is returned. The tie is broken arbitrarily. For instance,

In Figure 3, the predicted span for ”emotion” over-

laps with clauses c2 and c3 in which c

is more over-

lapped. As a result, c

is the predicted emotion.

EC Pair Prediction. Given predicted emo-

tion/cause clauses c

and c

, Indep-QA simply

predicts (c

, c

) as an emotion-cause pair. As

illustrated in Figure 3, (c

, c4) is the predicted

emotion-cause pair.

3.3 Guided Emotion-Cause Pair

Extraction

The Indep-QA model extracts emotion/clause clauses

independently but does not exploit the relationship

between emotion and cause clauses, which plays an

important role in the extraction of emotion-cause

pairs (Ding et al., 2020; Wei et al., 2020; Chen et al.,

2020; Cheng et al., 2021; Chen et al., 2022).

To better model this relationship, we introduce

Guided-QA in Figure 4. The model receives an emo-

tion question and predicts the corresponding emotion

clause. Then the predicted emotion clause is used as

a question for cause extraction. Compared to Indep-

QA, the Guided-QA takes into account an implicit re-

lationship from emotion for cause prediction.

The Guided-QA model shares the question formu-

lation, hidden representation learning, and the map-

ping process of the Indep-QA model.

EC Pair Extraction. We used the predicted (noisy)

emotion clause as the question for cause extraction.

The interaction between emotion and cause happens

here. The predicted emotion clause may or may not

be the true one but on average, it contains much more

information for the QA model than the generic ques-

tion (i.e., ”emotion”). Note that the predicted (noisy)

emotion as the question was used for the test set only.

For the training set, as the model already knows which

clauses are emotion or cause, it uses the true emotion

clause as the question.

By swapping the role, the model can detect cause

clauses ﬁrst and use the noisy causes as questions

to predict the emotions. In Section 5 we compare

Emotion-ﬁrst and Cause-ﬁrst, the two variants of

Guided-QA and show that the gaps are tiny. In other

word, the two variants are almost equivalent on the

tested datasets.

As our QA models use the best answer span for

each question, only one emotion, one cause, and one

EC pair are predicted for each document which are

appropriate for the ECPE dataset. We also aware that

the prediction of spans should be multiple and we aim

to address this limitation in future work by using mul-

Emotion-Cause Pair Extraction as Question Answering

991

tiple span extraction methods (Nguyen et al., 2021; Fu

et al., 2021).

3.4 Discussion

Given a document of n clauses, existing schemes such

as ECPE-MLL (Ding et al., 2020), RankCP (Wei

et al., 2020) and PairGCN (Chen et al., 2020) attempt

to reduce the O(n

) complexity of emotion-cause

pair classiﬁcation by using sliding window, transition

graph techniques. However, these techniques may

miss certain interaction between the emotion-cause

pair and the full context in the document. BERT-

based QA models with full attention between the

question and the context mitigate this issue. Through

QA models, the emotion-cause relationship between

all clauses is implicitly learned and we can leverage

the power of existing QA methods.

4 EXPERIMENTAL SETTINGS

Datasets. We followed the 10-split ECPE dataset

provided by (Xia and Ding, 2019) and the 20-split

TransECPE variant (Fan et al., 2020) to evaluate our

methods. Each split is a random partition of the 1945

documents to train/dev/test sets with ratio 8:1:1, i.e.,

the train set, dev set and test set contain approximately

1556, 194 and 195 documents. On average, each doc-

ument contains 14.8 clauses.

Table 1 shows the distribution of documents with

different number of emotion-cause pairs. Most of the

documents have only one emotion-cause pairs. This

fact makes the detection of emotion/cause clauses as

well as emotion-cause pairs challenging.

Evaluation Metrics. We used the precision, recall,

and F1 score (Xia and Ding, 2019) as evaluation met-

rics for all three tasks of ECPE: emotion extraction,

cause extraction and emotion-cause pair extraction.

Let T

and P

be the number of ground-truth and pre-

dicted emotion clauses respectively, the precision, re-

call and F1 score for emotion are as deﬁned as fol-

lows.

∩ P

2 ∗ P

∗ R

+ R

Metrics for cause clauses and emotion-cause pairs

are deﬁned similarly.

Implementation Details. Our model was imple-

mented using BERT classes provided by Hugging-

face (Wolf et al., 2020). The model was trained in 5

epochs, with the learning rate of 5e− 5, and the batch

size of 16. We used BERT (Devlin et al., 2018)

and

RoBERTa (Liu et al., 2019)

for Chinese. All models

were trained on a Tesla P100 GPU.

5 RESULTS AND DISCUSSION

Guided-QA: Emotion-First vs. Cause-First. We

ﬁrst compare the two variants Emotion-ﬁrst and

Cause-ﬁrst of the Guided-QA method. Table 2 shows

that the two variants have almost equivalent perfor-

mance on the tested datasets except the BERT-based

results on 10-split ECPE. Also, the RoBERTa-based

results are consistently better than the BERT-based,

1.1 to 2.0 points. In the next section, we pick the

Emotion-ﬁrst scores for comparing Guided-QA with

other methods.

Guided-QA vs. Indep-QA. We now compare

Guided-QA and Indep-QA. For 10-split ECPE in the

upper part of Table 3, the Guided-QA model is consis-

tently better than Indep-QA for pair extraction. This

is because Guided-QA takes into account the implicit

interaction between emotion and cause clauses. For

emotion or cause extraction, Indep-QA is competitive

with Guided-QA. This is because they share the same

formulation. The results in Table 4 also show similar

observation.

We also conﬁrm the performance of our model by

using RoBERTa to have better analysis. The results

are consistent with the model using BERT, in which

Guided-QA outputs better F-scores than the Indep-

QA model. It also shows that our model can be im-

proved further by using stronger PLMs.

Guided-QA vs. Strong Baselines. We compare our

model with ﬁve strong methods for ECPE: ECPE-

MLL

(Ding et al., 2020), RankCP

(Wei et al., 2020),

PairGCN

(Chen et al., 2020), UTOS (Cheng et al.,

2021), and RSN (Chen et al., 2022). For 10-split,

our model using BERT follows ECPE-MLL, RankCP,

and RSN. It shows that with a simple architecture,

our model can output competitive results compared

https://huggingface.co/bert-base-chinese

https://huggingface.co/hﬂ/chinese-roberta-wwm-ext

https://github.com/NUSTM/ECPE-MLL

https://github.com/Determined22/Rank-Emotion-

Cause

https://github.com/chenying3176/PairGCN ECPE

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

992

Table 1: Histogram of the number of emotion-cause pairs per document.

Number Percentage

Documents with one emotion-cause pair 1746 89.77%

Documents with two emotion-cause pairs 177 9.10%

Documents with more than two emotion-cause pairs 22 1.13%

All 1945 100%

Table 2: Guided-QA Emotion-ﬁrst vs. Cause-ﬁrst on 10-split ECPE dataset and 20-split TransECPE dataset.

Model Emotion Extraction Cause Extraction EC Pair Extraction

P R F1 P R F1 P R F1

10-split ECPE

Emotion-ﬁrst (BERT) 0.847 0.908 0.876 0.719 0.792 0.754 0.771 0.692 0.729

Cause-ﬁrst (BERT) 0.831 0.891 0.860 0.714 0.787 0.749 0.763 0.685 0.722

Emotion-ﬁrst (RoBERTa) 0.854 0.916 0.884 0.732 0.806 0.767 0.786 0.706 0.744

Cause-ﬁrst (RoBERTa) 0.843 0.904 0.873 0.733 0.807 0.768 0.784 0.704 0.742

20-split TransECPE

Emotion-ﬁrst (BERT) 0.842 0.906 0.873 0.710 0.782 0.744 0.760 0.689 0.723

Cause-ﬁrst (BERT) 0.833 0.897 0.864 0.713 0.785 0.747 0.761 0.690 0.724

Emotion-ﬁrst (RoBERTa) 0.844 0.909 0.875 0.723 0.796 0.757 0.772 0.700 0.734

Cause-ﬁrst (RoBERTa) 0.838 0.902 0.869 0.724 0.797 0.758 0.773 0.701 0.735

to complicated methods. For 20-split TransECPE in

Table 4, the trend is consistent with Table 3, in which

the Guided-QA model is competitive for both ECE

and ECPE tasks.

Moreover, as we observe from all the compared

methods, the gaps between the reported pair-f1 scores

for 10-split ECPE and 20-split TransECPE are 0.023

(=0.745-0.722) for ECPE-MLL, 0.042 for RankCP,

0.029 for UTOS, 0.003 for Indep-QA and 0.006 for

Guided-QA, i.e., largest gap in RankCP and smallest

gaps in our models. Across the two settings, our mod-

els seem more robust than the compared methods.

Reproducibility. For fair comparison (Houghton

et al., 2020), we also rerun publicly available source

codes in the original setting. The reproduced results

conﬁrm the gaps between reproduction and original

results. Compared to the reproduced results, Guided-

QA using BERT is the best for EC pair extraction.

Compared to the results of reproduced methods,

the Guided-QA is still better for both ECE and ECPE

tasks. This conﬁrms our hypotheses stated in Sec-

tion 1. Compared to the results of strong baselines re-

ported in papers, the F-scores of Guided-QA are still

competitive. It shows that our simple model can out-

put promising results compared to complicated ECPE

methods (Ding et al., 2020; Wei et al., 2020; Chen

et al., 2020; Cheng et al., 2021; Chen et al., 2022).

The results from the original papers are just for ref-

erence because it seems there are gaps between the

reproduced results and original results.

. This is be-

https://github.com/Determined22/Rank-Emotion-

cause several scholars tried to reproduce the results,

but it seems there are gaps between the reproduced

results and original results.

For 20-split TransECPE in Table 4, the trend is

consistent with Table 3. The Guided-QA is competi-

tive for both ECE and ECPE tasks. The model using

RoBERTa is still the best. After rerunning the source

codes of the baselines, we found that PairGCN has the

best reproducibility.

By adopting the standardized pipeline of BERT-

based question answering, our models inherit its sim-

plicity and reproducibility which may become an is-

sue in more complex methods like RankCP.

Runtime Comparison. We also measured the run-

ning time of our model and the baselines. In Table

5, PairGCN which only uses BERT embeddings has

the best running time. The other models take longer

to run due to the ﬁne-tuning of BERT models. Our

model is the second best, which is much faster than

ECPE-MLL. It shows that our model can balance be-

tween competitive accuracy and high speed.

6 CONCLUSION

This paper introduces a paradigm shift for the ECPE

task. Instead of treating the task as the conventional

formulation, we formulate the extraction as a QA

problem. Based on that, we design a model which

Cause/issues/3

Emotion-Cause Pair Extraction as Question Answering

993

Table 3: Experimental results of different models on 10-split ECPE dataset. * indicates reproduced results.

Model Emotion Extraction Cause Extraction EC Pair Extraction

P R F1 P R F1 P R F1

Indep-QA (BERT) 0.847 0.908 0.876 0.714 0.787 0.749 0.736 0.661 0.697

Guided-QA (BERT) 0.847 0.908 0.876 0.719 0.792 0.754 0.771 0.692 0.729

Indep-QA (RoBERTa) 0.854 0.916 0.884 0.733 0.807 0.768 0.761 0.683 0.720

Guided-QA (RoBERTa) 0.854 0.916 0.884 0.732 0.806 0.767 0.786 0.706 0.744

ECPE-MLL (BERT) 0.861 0.919 0.889 0.738 0.791 0.763 0.770 0.724 0.745

RankCP (BERT) 0.912 0.900 0.906 0.746 0.779 0.762 0.712 0.763 0.736

PairGCN (BERT) 0.886 0.796 0.838 0.791 0.693 0.738 0.769 0.679 0.720

UTOS (BERT) 0.882 0.832 0.856 0.767 0.732 0.747 0.739 0.706 0.720

RSN (BERT) 0.861 0.892 0.876 0.773 0.740 0.755 0.760 0.722 0.739

ECPE-MLL (BERT)* — — — — — — 0.688 0.752 0.718

RankCP (BERT)* 0.741 0.744 0.742 0.614 0.647 0.627 0.573 0.625 0.597

PairGCN (BERT)* 0.784 0.883 0.829 0.686 0.795 0.735 0.675 0.772 0.718

Table 4: Experimental results of different models on 20-split TransECPE dataset. * indicates reproduced results. The authors

of PairGCN and RSN did not tested their models on TransECPE.

Model Emotion Extraction Cause Extraction EC Pair Extraction

P R F1 P R F1 P R F1

Indep-QA (BERT) 0.842 0.906 0.873 0.713 0.785 0.747 0.730 0.662 0.694

Guided-QA (BERT) 0.842 0.906 0.873 0.710 0.782 0.744 0.760 0.689 0.723

Indep-QA (RoBERTa) 0.844 0.909 0.875 0.724 0.797 0.758 0.739 0.670 0.703

Guided-QA (RoBERTa) 0.844 0.909 0.875 0.723 0.796 0.757 0.772 0.700 0.734

ECPE-MLL (BERT) 0.847 0.899 0.872 0.705 0.770 0.736 0.749 0.698 0.722

RankCP (BERT) 0.894 0.895 0.894 0.694 0.747 0.719 0.658 0.731 0.692

UTOS (BERT) 0.865 0.829 0.849 0.742 0.708 0.728 0.710 0.681 0.691

ECPE-MLL (BERT)* — — — — — — 0.659 0.714 0.684

RankCP (BERT)* 0.896 0.897 0.896 0.694 0.749 0.720 0.657 0.731 0.691

PairGCN (BERT)* 0.804 0.878 0.839 0.689 0.770 0.727 0.677 0.746 0.709

Table 5: Running time (train and test) on Tesla P100.

ECPE TransECPE

ECPE-MLL 8.5h 17h

RankCP 3h 6h

PairGCN 42min 85 min

Indep-QA 2h30 5h

Guided-QA 2h30 5h

takes into account the implicit interaction between

emotion and cause clauses. Experimental results on

a benchmark Chinese dataset show that using implicit

interaction of emotions and causes can achieve com-

petitive accuracy compared to strong baselines. Fu-

ture work will consider explicit interaction between

emotion and cause clauses.

REFERENCES

Bi, H. and Liu, P. (2020). Ecsp: A new task for emotion-

cause span-pair extraction and classiﬁcation. arXiv

preprint arXiv:2003.03507.

Chen, F., Shi, Z., Yang, Z., and Huang, Y. (2022). Recur-

rent synchronization network for emotion-cause pair

extraction. Knowledge-Based Systems, 238:107965.

Chen, Y., Hou, W., Li, S., Wu, C., and Zhang, X. (2020).

End-to-end emotion-cause pair extraction with graph

convolutional network. In Proceedings of the 28th In-

ternational Conference on Computational Linguistics,

pages 198–207.

Cheng, Z., Jiang, Z., Yin, Y., Li, N., and Gu, Q. (2021).

A uniﬁed target-oriented sequence-to-sequence model

for emotion-cause pair extraction. IEEE/ACM Trans-

actions on Audio, Speech, and Language Processing,

29:2779–2791.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.

(2018). Bert: Pre-training of deep bidirectional trans-

formers for language understanding. arXiv preprint

arXiv:1810.04805.

Ding, Z., Xia, R., and Yu, J. (2020). End-to-end emotion-

cause pair extraction based on sliding window multi-

label learning. In Proceedings of the 2020 Conference

on Empirical Methods in Natural Language Process-

ing (EMNLP), pages 3574–3583.

Fan, C., Yuan, C., Du, J., Gui, L., Yang, M., and Xu, R.

(2020). Transition-based directed graph construction

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

994

for emotion-cause pair extraction. In Proceedings of

the 58th Annual Meeting of the Association for Com-

putational Linguistics, pages 3707–3717.

Fu, J., Huang, X.-J., and Liu, P. (2021). Spanner: Named

entity re-/recognition as span prediction. In Proceed-

ings of the 59th Annual Meeting of the Association

for Computational Linguistics and the 11th Interna-

tional Joint Conference on Natural Language Pro-

cessing (Volume 1: Long Papers), pages 7183–7195.

Gui, L., Hu, J., He, Y., Xu, R., Lu, Q., and Du, J. (2017).

A question answering approach for emotion cause ex-

traction. In Proceedings of the 2017 Conference on

Empirical Methods in Natural Language Processing,

pages 1593–1602.

Gui, L., Wu, D., Xu, R., Lu, Q., and Zhou, Y. (2016).

Event-driven emotion cause extraction with corpus

construction. In Proceedings of the 2016 Conference

on Empirical Methods in Natural Language Process-

ing, pages 1639–1649.

Houghton, B., Milani, S., Topin, N., Guss, W., Hofmann,

K., Perez-Liebana, D., Veloso, M., and Salakhutdinov,

R. (2020). Guaranteeing reproducibility in deep learn-

ing competitions. arXiv preprint arXiv:2005.06041.

Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord,

O., Clark, P., and Hajishirzi, H. (2020). Uniﬁedqa:

Crossing format boundaries with a single qa system.

arXiv preprint arXiv:2005.00700.

Lee, S. Y. M., Chen, Y., and Huang, C.-R. (2010). A text-

driven rule-based system for emotion cause detection.

In Proceedings of the NAACL HLT 2010 workshop on

computational approaches to analysis and generation

of emotion in text, pages 45–53.

Li, X., Feng, J., Meng, Y., Han, Q., Wu, F., and Li, J. (2019).

A uniﬁed mrc framework for named entity recogni-

tion. arXiv preprint arXiv:1910.11476.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,

Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,

V. (2019). Roberta: A robustly optimized bert pre-

training approach. arXiv preprint arXiv:1907.11692.

Mengge, X., Yu, B., Zhang, Z., Liu, T., Zhang, Y., and

Wang, B. (2020). Coarse-to-ﬁne pre-training for

named entity recognition. In Proceedings of the 2020

Conference on Empirical Methods in Natural Lan-

guage Processing (EMNLP), pages 6345–6354.

Nguyen, T.-A. D., Vu, H. M., Son, N. H., and Nguyen, M.-

T. (2021). A span extraction approach for informa-

tion extraction on visually-rich documents. In Interna-

tional Conference on Document Analysis and Recog-

nition, pages 353–363. Springer.

Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016).

Squad: 100,000+ questions for machine comprehen-

sion of text. arXiv preprint arXiv:1606.05250.

Sun, T.-X., Liu, X.-Y., Qiu, X.-P., and Huang, X.-J. (2022).

Paradigm shift in natural language processing. Ma-

chine Intelligence Research, 19(3):169–183.

Wei, P., Zhao, J., and Mao, W. (2020). Effective inter-clause

modeling for end-to-end emotion-cause pair extrac-

tion. In Proceedings of the 58th Annual Meeting of

the Association for Computational Linguistics, pages

3171–3181.

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C.,

Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz,

M., et al. (2020). Transformers: State-of-the-art nat-

ural language processing. In Proceedings of the 2020

conference on empirical methods in natural language

processing: system demonstrations, pages 38–45.

Xia, R. and Ding, Z. (2019). Emotion-cause pair extraction:

A new task to emotion analysis in texts. In Proceed-

ings of the 57th Annual Meeting of the Association for

Computational Linguistics, pages 1003–1012.

Yan, H., Gui, L., Pergola, G., and He, Y. (2021). Posi-

tion bias mitigation: A knowledge-aware graph model

for emotion cause extraction. In Proceedings of the

59th Annual Meeting of the Association for Compu-

tational Linguistics and the 11th International Joint

Conference on Natural Language Processing (Volume

1: Long Papers), pages 3364–3375.

Emotion-Cause Pair Extraction as Question Answering

995