FLE: A Fuzzy Logic Algorithm for Classification of Emotions
in Literary Corpora
Luis-Gil Moreno-Jim
´
enez
1 a
, Juan-Manuel Torres-Moreno
1,2 b
, Hanifa Boucheneb
2 c
and Roseli S. Wedemann
3 d
1
Laboratoire Informatique d’Avignon, Universit
´
e d’Avignon, 339 Chemin des Meinajaries, 84911 Avignon, C
´
edex 9, France
2
D
´
epartement de G
´
enie Informatique et G
´
enie Logiciel, Polytechnique Montr
´
eal,
2500, Chemin de Polytechnique Montr
´
eal, Qu
´
ebec, Canada
3
Universidade do Estado do Rio de Janeiro, Rua S
˜
ao Francisco Xavier 524, 20550-900, Rio de Janeiro, RJ, Brazil
Keywords:
Emotion Classification, Natural Language Processing, Fuzzy Logic, Literary Sentences.
Abstract:
This paper presents an algorithm based on fuzzy logic, devised to identify emotions in corpora of literary texts,
called Fuzzy Logic Emotions (FLE) classifier. This algorithm evaluates a sentence to define the class(es) of
emotions to which it belongs. For this purpose, it considers three types of linguistic variables (verb, noun
and adjective) with associated linguistic values used to qualify the emotion they express. A numerical value is
computed for each of these terms within a sentence, based on its frequency and the inverse document frequency
(TF-IDF). We have tested our FLE classifier with an evaluation protocol, using a literary corpus in Spanish
specially structured for working with the automatic detection of emotions in text. We present encouraging
performance results favoring our FLE classifier, when compared to other known algorithms established in the
literature used for the detection of emotions in text.
1 INTRODUCTION
Natural Language Processing (NLP) is a very ac-
tive area of interdisciplinary work, involving subjects
such as computer science, statistical physics, cog-
nitive neuroscience, linguistics and psychology (see
(Clark et al., 2018; Ke and Xiaojun, 2018; Torres-
Moreno, 2012; Wedemann and Plastino, 2016; Sid-
diqui et al., 2018) and references therein). In partic-
ular, there has been much effort in the NLP research
community in recent years, to develop automatic pro-
cedures to analyse emotions or sentiment in text (Pang
and Lee, 2008; Cambria, 2016; Iria et al., 2011). The
task of analysing sentiment and emotions in literary
texts is even more difficult, because the complexity
of this genre of text presents ambiguous characteris-
tics, which often make their understanding difficult,
even for humans. We have thus concentrated our ef-
forts on approaching this problem with fuzzy logic,
a
https://orcid.org/0000-0001-7753-7349
b
https://orcid.org/0000-0002-4392-1825
c
https://orcid.org/0000-0001-9158-6374
d
https://orcid.org/0000-0001-7532-3881
as this technique tries to find good solutions for prob-
lems through the analysis of subjective information,
simulating human analysis (Matiko et al., 2014).
Fuzzy logic was introduced in (Zadeh, 1965) to
solve problems with imprecise information. It allows
the specification and processing of imprecise infor-
mation, in order to derive useful information. An ex-
ample of a phrase with imprecise information is “The
temperature is high”, as it does not provide the precise
temperature value. In this example, fuzzy logic rep-
resents the imprecise information by a fuzzy set High
and a real valued membership function µ
High
, that as-
sociates with each numerical temperature value t ,
a membership value µ
High
(t) [0,1]. The member-
ship value µ
High
(t) = 0 means that t does not belong
to the fuzzy set High. Conversely, membership value
1 means that t belongs to the fuzzy set High. A mem-
bership value between 0 and 1 means that t partially
belongs to the fuzzy set High. In this context, the vari-
able temperature is called a linguistic variable and t
represents its numerical value. The fuzzy set High is
called a linguistic value of the linguistic variable tem-
perature. It represents a range with imprecise bound-
aries of the numerical temperature values.
202
Moreno-Jiménez, L., Torres-Moreno, J., Boucheneb, H. and Wedemann, R.
FLE: A Fuzzy Logic Algorithm for Classification of Emotions in Literary Corpora.
DOI: 10.5220/0010110902020209
In Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - Volume 1: KDIR, pages 202-209
ISBN: 978-989-758-474-9
Copyright
c
2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
Usually, each linguistic variable can assume a lin-
guistic value, within a collection of fuzzy sets (a set
of possible linguistic values). For example, the fuzzy
sets High, Very High, Normal, Low and Very Low can
be used to define the linguistic values of the linguis-
tic variable temperature. Each of these fuzzy sets is
described by a membership function, defined over the
domain of the numerical temperature t.
A fuzzy logic thus consists of a set of linguistic
variables, their linguistic values (the fuzzy sets) and
a set of fuzzy rules. A fuzzy set G is described by
a membership function µ
G
: x [0,1], defined over
the domain of the numerical valued variable x asso-
ciated with the linguistic value G. Among the com-
monly used membership functions, we can cite the
triangular function, the trapezoidal function, and the
Gaussian function. A fuzzy rule is of the form
IF <condition> THEN <conclusion>,
where condition and conclusion are combinations of
the classical logic connectors, with basic terms of the
form
< lin g u i s ti c va r iab l e > is G.
The condition can involve partially satisfying
membership to sets for the given input data and, con-
sequently, the conclusion may also involve partial
membership. The degree of matching between the
rule and the input data (its truth degree) depends on
the values of the membership functions for the input
data, for the fuzzy sets involved in the condition of
the rule. For example,
IF temperature is High OR
temperature is Very High
THEN
Fan Speed is Fast
is a fuzzy rule. For input data t = d, the condition
that must be evaluated regarding the value of d is
max(µ
High
(d),µ
Very High
(d)) > θ, where θ is a given
threshold. When this condition is true for input d, the
conclusion is Fan Speed is Fast .
Fuzzy rules are thus used to infer an output based
on the input data. In general, this process consists of
executing the following three steps.
1. Fuzzy Membership: Calculate the degree of
membership (the membership function) for each
of the fuzzy sets.
2. Inference: Execute the rules in the rulebase to ob-
tain the fuzzy conclusion.
3. Defuzzification: Convert the fuzzy conclusion ob-
tained in Step 2 into a crisp one.
Fuzzy logic has been successfully used in many
fields such as control systems, image processing, op-
timization, robotics, and natural language processing.
In this work, we introduce a fuzzy logic model for de-
tecting the emotions expressed by specialized linguis-
tic data sets (corpora). The model has been validated
on a corpus consisting of literary texts in Spanish.
In Section 2, we review some basic literature re-
garding algorithms developed for the classification of
emotions, related to this work. We describe the cor-
pora employed to train and test our classifier in Sec-
tion 3. Section 4 describes our fuzzy logic classifier.
In Section 5, we present our experimental protocol.
Results and evaluations are shown in Section 6 and
finally, Section 7 has our concluding remarks.
2 RELATED WORK
Fuzzy logic techniques are commonly used to anal-
yse comments regarding products or services offered
on the internet. For example in (Tashtoush and Al
Aziz Orabi, 2019), a fuzzy logic model is proposed
for predicting the following emotions in tweets: Joy,
Sadness, Anger, Disgust, Trust, Fear, Surprise, and
Anticipation, with 48.96% accuracy. In (Indhuja and
Reghu, 2014), a method is proposed to classify prod-
uct reviews into the categories: negative, neutral and
positive. In (Dragoni et al., 2015), the authors use
fuzzy logic for modeling concept polarities, and test
their method by classifying product reviews in an
Amazon dataset according to their polarities.
Several works approach the problem of treating
emotions in cognitive agents with fuzzy logic. In
(Howells and Ertugan, 2017), social media comments
in tweets are treated with fuzzy logic and classified
into: very negative, negative, neutral, positive and
very positive, by analysing emojis, hashtags, and tex-
tual meaning. In (Arguedas et al., 2018), the au-
thors propose a fuzzy classifier to detect the emotional
states of students, by analysing their discourses in an
online learning environment. In (Matiko et al., 2014),
the authors present an algorithm for classifying pos-
itive and negative emotions from electroencephalo-
grams.
Finally, (Vashishtha and Susan, 2020) propose a
supervised fuzzy rule-based model to classify video
reviews on social media, by analysing linguistic fea-
tures and also accent and acoustic, into negative and
positive categories, with 82.5% of accuracy.
3 CORPORA EMPLOYED
We have used two specially structured corpora of lit-
erary sentences in Spanish to test our model. The
FLE: A Fuzzy Logic Algorithm for Classification of Emotions in Literary Corpora
203
CitasIn corpus was used in the learning phase and
the LiSSS corpus for testing.
3.1 Learning Corpus
The CitasIn
1
corpus (Torres-Moreno and Moreno-
Jim
´
enez, 2020) composed by sentences recovered
from the website https://citas.in was used in the learn-
ing phase. It consists of a large number of documents
belonging to different categories (friendship, lovers,
beauty, success, happiness, laughter, enmity, decep-
tion, anger, fear, etc.) These documents were manu-
ally clustered into the five following classes:
1. Anger (A),
2. Fear (F),
3. Happiness (H),
4. Love (L) and
5. Sadness or Pain (S).
Table 1 shows the number of sentences and words for
each class, and the average value of the number of
words per sentence in each class.
Table 1: CitasIn corpus, sentences clustered in 5 emotions.
Classes # Sentences # Words # W / S
tag (S) (W)
A 15 043 280 784 18.7
F 14 773 275 059 18.6
H 13 647 256 697 18.8
S 14 589 275 931 18.9
L 14 738 264 339 29.2
Total 72 790 1 352 810 18.6
3.2 Test Corpus
We employed the corpus of Emotions in Literary
Sentences in Spanish (LiSSS)
2
(Torres-Moreno and
Moreno-Jim
´
enez, 2020) for testing the FLE classifier.
The LiSSS corpus was especially created to test and
validate algorithms for automatic emotion classifica-
tion and analysis of literary texts. It consists of liter-
ary sentences and paragraphs from approximately 200
Spanish-speaking authors, and also sentences from
non Spanish-speaking authors (using always official
or good quality translations to Spanish)
The sentences of the LiSSS corpus were man-
ually classified into the same five categories as the
1
A version of CitasIn corpus with snippet sentences is
available in website juanmanuel.torres.free.fr/corpus/lisss/
CitasIn.zip. The reader should not have problem reconsti-
tuting the corpus CitasIn using these snippets.
2
LiSSS V0.500 is downloadable from the website: http:
//juanmanuel.torres.free.fr/corpus/lisss/
CitasIn corpus, {A, F, H, L, S}. Each sentence
may be classified into more than one of the ve cat-
egories. All sentences belonging to LiSSS were ex-
cluded from CitasIn, to avoid over-fitting and bias
during the learning phase in our experiments. The
main properties of LiSSS are depicted in Tables 2
and 3.
Table 2: LiSSS corpus of literary sentences.
Sentences Paragraphs Words
LiSSS 500 49 9 392
Table 3 represents the distribution of sentences
among the five classes of emotions, in a matrix lay-
out, calculated by dividing the number of sentences
tagged for each class by the number of annotators.
For example it shows, in position [A,A] (first line,
first column), that there are 74.1 sentences that have
been classified exclusively in the Anger category. The
other positions in the first row show that 12.1 sen-
tences were tagged with A and L, 7.9 with A and F,
2.3 with A and H, and 9 with A and S. In this corpus,
approximately 18% of the sentences were classified
as multi-emotional sentences. For each class we ob-
tain an overlapping degree defined in the following
way. Let mono represent the average number of sen-
tences classified in a unique class, and multi represent
the average number of sentences classified in multiple
classes. The proportion of sentences that were classi-
fied in a certain class and also in other classes, which
we call the overlap, Ov, can be calculated as Ov =
multi/(mono + multi). For example, in the case of
class A, we have Ov = 31.3/(74.1 + 31.3) = 29.7%,
so that 29.7% of the sentences that were classified in
A were also classified in other classes. The rest of the
table can be read in a similar manner. The two sets
with greatest overlap are Love and Sadness/Pain, cor-
responding to the emotions that are most correlated in
the sentences of LiSSS.
Table 3: Distribution of emotions in LiSSS, with correla-
tions (Cl = classes, Ov = overlap).
C A L F H S O
O
Ov
v
v%
A 74.1 12.1 7.9 2.3 9.0 29.7
L 12.1 89.5 5.7 10.8 19.9 35.1
F 7.9 5.7 103.2 0.6 17.2 23.3
H 2.3 10.8 0.6 92.4 18.7 26.0
S 9.0 19.9 17.2 18.7 115.3 36.0
Due to the subjective nature of the emotional per-
ception of literary documents, the overlap of classi-
fication among different classes is comprehensible.
The algorithm we propose here, based on fuzzy logic,
could enable multi-classification by establishing a cri-
terion based on a measure of distance, between mem-
KDIR 2020 - 12th International Conference on Knowledge Discovery and Information Retrieval
204
bership values of a sentence and the centroid of the
different classes.
4 FLE Classifier
In this section, we describe our Fuzzy Logic Emo-
tions (FLE) classification algorithm, based on fuzzy
logic. The algorithm consists of some basic proce-
dures: determination of the linguistic variables and
their values, definition of the membership function
and of the fuzzy rules, and defuzzification. Figure 2
shows a graph representation of the FLE scheme.
4.1 Linguistic Variables
Our algorithm evaluates a sentence, Sent, to define
the class(es) of emotions to which it belongs. For this
purpose, we consider the set of linguistic variables to
be the set consisting of action verbs, nouns and ad-
jectives in Sent. We define n as the number of ac-
tion verbs, adjectives and nouns in Sent, and thus n is
the number of linguistic variables in Sent. We have
used the FreeLing tool (Padr
´
o and Stanilovsky, 2012)
to identify the linguistic variables in Sent, as Free-
Ling returns the grammatical information that charac-
terizes a word, such as the Part of Speech (POS) label,
gender and other information.
As an example, in the sentence: Contempt must
be the most mysterious of our feelings”, there are three
linguistic variables:
contempt, a Noun,
mysterious, an Adjective, and
feelings, a Noun.
4.2 Linguistic Values
The next step is to associate a linguistic value (LV)
to each linguistic variable of Sent. The LV is used to
simulate the process that a human follows to classify
an object according to its properties. For example, if it
is required to know if a machine needs to be cooled, a
human operator can consider temperature as a linguis-
tic variable and the possible LVs can be those in the
set {very hot, hot, warm, cold, very cold}. So know-
ing which LV applies to the observed machine, the op-
erator may be able to take a decision. Each linguistic
variable in Sent is associated to ve LVs, correspond-
ing to the emotions which it expresses. Each of the
five LVs of a linguistic variable is related to a numer-
ical value, equal to its term frequency - inverse docu-
ment frequency (TF-IDF) (Jones, 1972). TF-IDF is a
measure that indicates the importance of a term with
respect to an analyzed set of documents. This well
known measure is very useful and it has been used
for a long time, in tasks of extraction or recovery of
information (information retrieval) and text mining,
among others. To calculate the TF-IDF scores, we
have used the CitasIn corpus divided into five sub-
corpora, one per class, as mentioned in Section 3.1.
Each document has been pre-processed with FreeL-
ing to extract only the lemmas of action verbs, adjec-
tives and nouns. We have thus generated a reduced
version of CitasIn with 5 classes of the resulting pre-
processed documents, one for each of the emotions.
We then calculate the TF-IDF values for each term
with respect to the documents in each class, using a
tool developed in PERL 5.0. So, for example, it is ex-
pected that given the word romance, its TF-IDF score
related to the class Love will be higher than with re-
spect to the other classes. To smooth out the high
increase of the TF-IDF score for a negative adverb,
the TF-IDF, in these cases, is substituted by its square
root.
To represent the numerical value associated to
each linguistic variable in Sent, we associate a matrix
V
V
V with Sent, where each element V
i, j
is the TF-IDF of
the j-th linguistic variable, for the i-th class.
4.3 Membership Function
The third step consists in selecting the member-
ship function, that will attribute a membership value,
MD
i, j
, to each V
i, j
. There are various kinds of mem-
bership functions, and each of them returns a value in
the interval [0,1], where 1 means the word belongs to
the class, and 0 means it does not. We have chosen
a triangular function given by Eq. (1) (see Fig. 1).
This choice has been motivated by the fact that the
relevance of a word with respect to the documents in
a class should increase, in proportion to its TF-IDF
score, normalized in [0,1]. Thus, for each linguistic
variable in Sent, and each class
MD
i, j
= µ(V
i, j
) =
V
i, j
a
i
b
i
a
i
, a
i
V
i, j
b
i
0, elsewhere ,
(1)
where a
i
= min{V
i,k
: k class i} and b
i
= max{V
i,k
:
k class i}. Note that to obtain a
i
and b
i
, one con-
siders all the linguistic variables of CitasIn in class i
(not only the n linguistic variables in Sent), this gives
us a more realistic measure of the relevance of each
term in a class.
4.4 Fuzzy Rule
The fuzzy rules allow the determination of the true
membership value, according to a condition which
FLE: A Fuzzy Logic Algorithm for Classification of Emotions in Literary Corpora
205
=TF-IDF
score
0.25
0.75
1
Membership
degree
μ(x)
μ(bi)
0
a
i
b
i
x
Figure 1: Triangular membership representation.
depends on the MD
i, j
previously calculated. We in-
troduce the following variables
M
i
=
1
n
n
j=1
MD
i, j
, (2)
MT =
1
5
5
i=1
M
i
. (3)
With these, we have defined one rule
IF M
i
MT THEN M
0
i
= M
i
ELSE M
0
i
= 0 .
This rule is applied to all the classes, so that we obtain
an M
0
i
for each class.
According to (2) and (3), M
i
is the average mem-
bership value for Sent for each class, and MT is the
average of M
i
over all classes. This rule thus com-
pares the membership of Sent for each class with the
average for all classes, to detect and ignore scores
which are very low and should not be considered.
4.5 Defuzzification
With the M
0
i
obtained with the fuzzy rule, we proceed
to the defuzzification process by first calculating the
centroid over all classes, according to
centroid =
5
i=1
(M
0
i
× SumV
i
)
5
i=1
M
0
i
, (4)
where
SumV
i
=
1
n
n
j=1
V
i, j
. (5)
We again use the triangular membership function,
η(x,e
i
), with parameters c = 0 and e
i
= SumV
i
,
η(x,e
i
) =
x c
e
i
c
, c x e
i
, (6)
η(x,e
i
) = 0, if x > e
i
or x < c, and calculate the mem-
bership value of the centroid, MC
i
= η(centroid, e
i
),
Linguistic variables
detection
FreeLing
Linguistic Values
(LV) assignment
Sentence
Verbs, Adjectives, Nouns
Processing LV with
Membership function (μ)
Triangular
Function
TF-IDF
Analysing μclass
with Fuzzy rule
Defuzzification
μi=anger,j
μhappiness,j
μfear,j
μlove,j
μsadness,j
Processing centroid with
Membership function (η)
Triangular
Function
IF mean(μclass)>mean(μclasses)
THEN μclass = mean(μclass)
ELSE μclass = 0
ηsentence,classes
1
2
4
5
3
Figure 2: Overall scheme of FLE model.
with respect to to each class i. The class for which
MC
i
is highest corresponds to the most predominant
emotion expressed by Sent.
Defuzzification is important because in (5), we
consider only the vocabulary present in Sent and the
centroid reveals the proximity between Sent and each
class. This should allow us to consider the classifi-
cation of Sent in more than one emotion with FLE,
with the establishment of adequate criteria, although
we have not treated this situation, in the present work.
4.6 An FLE Example
In order to illustrate how the FLE model works, we
present an example of the computations, considering
the sentence, Sent, Contempt is the most mysterious
of our feelings.
Linguistic Variables. There are three linguistic vari-
ables: contempt = N (noun), mysterious = A (ad-
jective) and feeling= N (noun)
Linguistic Values. Table 4 shows the V
i, j
values (TF-
IDF scores), for each linguistic variable, per
class. All TF-IDF scores were calculated over the
CitasIn corpus.
Membership Function. For each score in Table 4, a
KDIR 2020 - 12th International Conference on Knowledge Discovery and Information Retrieval
206
µ(V
i, j
) value is computed using the membership
function, Eq. (1). For example, the variable con-
tempt, with TF-IDF= V
4,1
= 250.02, for class 4
Love, is processed as follows.
1. We obtain the minimum TF-IDF score consid-
ering all the text in the Love class in CitasIn,
a
4
= min{V
4,k
: k class 4} = 0.2097, which
corresponds to a word that is not in Sent.
2. In the same way, b
4
= max{V
4,k
: k
class 4} = 5798.91, which also corresponds to
a word not in Sent.
3. Using these two reference values, we use
Eq. (1) to obtain MD
4,1
= µ(V
4,1
) = (V
4,1
a
4
)/(b
4
a
4
) = 0.043.
The same procedure is executed for all scores of
Table 4, to obtain the µ(V
i, j
) values shown in Ta-
ble 5.
Table 4: TF-IDF scores, V
i, j
, for each variable and class.
class
var
contempt mysterious feelings
Anger A 206.28 29.48 513.31
Fear F 134.15 113.78 418.05
Happiness H 198.07 47.08 603.09
Love L 250.02 127.45 941.91
Sadness S 141.00 59.64 548.86
Table 5: Membership values, MD
i, j
= µ(V
i, j
), for each vari-
able and class.
class
var
contempt mysterious feelings
Anger A 0.060 0.008 0.149
Fear F 0.045 0.038 0.142
Happiness H 0.060 0.014 0.184
Love L 0.043 0.022 0.162
Sadness S 0.049 0.021 0.193
Fuzzy Rules. The MD
i, j
= µ(V
i, j
) are used to calcu-
late the M
i
with Eq. (2), and MT with Eq. (3).
For Anger, the average of the membership val-
ues, MD
1, j
in Table 5, is M
1
= 0.072. In the
same way, one can obtain the other M
i
. The value
MT = 0.079 is the average of the M
i
, i. e. the
average of the 15 membership values in Table 5.
These M
i
and MT are then used to obtain the M
0
i
,
with the fuzzy rule presented in Section 4.4, as
A: (0.072 0.079) == False THEN M
0
A
= 0,
F: (0.075 0.079) == False THEN M
0
F
= 0,
H: (0.086 0.079) == True THEN M
0
H
= 0.086,
L: (0.076 0.079) == False THEN M
0
L
= 0,
S: (0.088 0.079) == True THEN M
0
S
= 0.088.
1200
900
600
300
1500
0.25
0.50
0.75
1
Happiness
Love
Centroid=798.49
Membership
degree η(x)
TF-IDF
score
Figure 3: Position of the centroid and membership function
η, Eq. (6), for the classes for which it is different than zero.
Defuzzification. The last procedure involves de-
fuzzification by centroid calculation, using the
new membership values M
0
i
obtained from the
fuzzy rule step. Equations (4) and (5) are then
used to obtain
centroid =
0 + 0 + 72.94 + 0 + 65.20
0 + 0 + 0.086 + 0 + 0.087
= 798.49.
Finally, the other membership function, MC
i
=
η(centroid,e
i
), Eq. (6), is used with x
c
=
centroid = 798.49, c = 0 and e
i
= SumV
i
Eq. (5)
to obtain:
MC
Anger
= η(x
c
,e
i
= 749.01) = 0,
MC
Fear
= η(x
c
,e
i
= 665.99) = 0,
MC
Happiness
= η(x
c
,e
i
= 848.24) = 0.941,
MC
Love
= η(x
c
,e
i
= 1319.39) = 0.605, and
MC
Sadness
= η(x
c
,e
i
= 749.52) = 0.
We observe that the centroid is closest to the Hap-
piness class, as it corresponds to the highest value
of η. We thus consider that Sent predominantly
expresses this emotion. These results are illus-
trated in Fig. 3.
5 EXPERIMENTAL SETUP
In Section 2, we reported other fuzzy logic based ap-
proaches focused on the classification of emotions.
However, most of these proposed models are for po-
larity detection or for classification based on the anal-
ysis of non-textual characteristics, such as the outputs
of sensors or images. Our proposal is different, we
deal with the classification of literary text consider-
ing psychological and emotional states. It is thus not
possible to compare our model objectively with most
of these works. Instead, we have compared our FLE
FLE: A Fuzzy Logic Algorithm for Classification of Emotions in Literary Corpora
207
classifier against several classical baseline methods,
also tested in the evaluation proposed by the authors
in (Torres-Moreno and Moreno-Jim
´
enez, 2020). The
experiments were performed using the CitasIn cor-
pus for training and the LiSSS corpus for testing.
FLE Classifiers and Baseline Algorithms
We have conducted tests for comparison with the
three following classic machine learning classifiers,
used as baseline algorithms: Support Vector Ma-
chine (SVM), Na
¨
ıve Bayes Multinomial Text (NBM)
and Na
¨
ıve Bayes (NB). We experimented with sev-
eral types of pre-processed data for the FLE clas-
sifier, using: the original form of verbs, adjectives
and nouns; lemmatized verbs, adjectives and nouns;
ultra-stemmed
3
verbs, adjectives and nouns; original
form of verbs and adjectives, and lemmatized nouns
(FLE
lemm
); and original form of verbs and adjectives,
and ultra-stemmed nouns (FLE
ultra
). We present here
results of experiments using the two pre-processing
methods that provided the best results, FLE
ultra
and
FLE
lemm
. Algorithms SVM and Na
¨
ıve Bayes (multi-
nomial and classical versions) were implemented us-
ing the Weka GUI
4
packages. Our FLE classifier was
written in Python 3. The input sentences were all fil-
tered of function words and function verbs pertaining
to stop-lists, and were lemmatized or ultra-stemmed
using Weka or Python libraries, respectively.
6 RESULTS
Results of experiments using the FLE method, with
lemmatized and ultra-stemmed pre-processing, can be
observed in Table 6. This table shows observed values
of the quantities Precision, Recall and F-score
5
. The
best value of each quantity is marked in bold font.
It is possible to observe that, in general, FLE
lemm
gives better results for precision and recall (F-score =
60.1%) than FLE
ultra
( F-score = 58.15%), except in
the case of recall for the classes Happiness and Fear.
In Table 7, we show the comparison between our
FLE methods and the baseline algorithms. Average
values of the F-score for each of the five classes are
shown in the last column. The best performance for
each class is marked in bold font.
3
The ultra-stemming algorithm kept at most the 5 first
characters of each word decreasing the execution time of the
algorithm and preserving the meaning of the words (Torres-
Moreno, 2012).
4
The Weka package may be downloaded from url: https:
//waikato.github.io/weka-wiki/
5
F-score is a harmonic combination of Precision (P) and
Recall (R); F-score= 2 × P · R/(P + R).
We can see that FLE
lemm
obtained a better F-
score in the Angry and Sadness/Pain classes, while the
NBM algorithm obtained a better F-score for Fear,
Happiness and Love. FLE
lemm
obtained the best av-
erage F-score value, and even in the 3 classes where
it did not obtain the best score, it obtained scores
that were very close to the best. Although FLE
lemm
outperforms FLE
ultra
, FLE
ultra
has the advantage of
being language independent, because it does not re-
quire a lemmatization, pre-processing procedure. As
we expect that this initial proposal has perspectives of
further improvements, these are relevant and encour-
aging results.
Table 6: Results obtained FLE
lemm
and FLE
ultra
in percent-
ages.
Precision Recall F-score
F
lemm
F
ultra
F
lemm
F
ultra
F
lemm
F
ultra
A 44.9 44.6 73.5 69.6 55.76 54.41
F 58.7 54.5 68.3 69.2 63.11 61.02
H 81.0 79.0 71.0 72.8 75.70 75.78
L 71.4 70.5 60.6 55.6 65.57 62.15
S 78.6 76.9 27.2 24.7 4037 37.38
x 66.9 65.1 60.1 58.3 60.1 58.15
Table 7: F-score values in percentage obtained with various
algorithms, for each class of emotion.
Model A F H L S Mean
SVM 55.80 54.26 59.00 50.33 55.6 44.99
NB 45.86 60.48 56.41 60.19 16.47 47.88
NBM 53.51 64.73 78.73 66.13 35.42 59.70
FLE
ultra
55.43 61.41 76.79 62.07 40.43 58.15
FLE
lemm
56.73 63.48 76.71 65.56 43.75 60.10
Comparing our results with those of some similar
methods reported in Section 2, it can be noticed that
in (Tashtoush and Al Aziz Orabi, 2019), where the
categories are: Joy, Sadness, Anger, etc.. the authors
achieved a performance of 48.96%, quite lower than
the 60.1% F-score obtained by our FLE
lemm
method,
indicating the difficulty involved in this kind of task.
Furthermore, we have faced the challenge of deal-
ing with literary text, which presents a more com-
plex vocabulary than that used in tweets or product re-
views. We also note that the method proposed in (Ar-
guedas et al., 2018) achieved 82.5% accuracy, how-
ever mainly directed towards the task of polarity de-
tection that requires a less complex analysis.
7 CONCLUSIONS
We have proposed FLE, a method for classifying lit-
erary texts according to their emotional content. The
method is based on fuzzy logic and was tested and
KDIR 2020 - 12th International Conference on Knowledge Discovery and Information Retrieval
208
validated with literary sentences, taken from specially
structured, literary corpora.
Our results show that this protocol is well suited
to the task of detection and classification of emotions
in literary manuscripts.
Literary works in various forms, such as para-
graphs, verses, sentences, or phrases, are difficult to
evaluate automatically, because they have a rich ex-
pressive diversity and present a high complexity in the
employment of language. Moreover, there is the issue
of multi-emotional sentences, with possible ambigu-
ity in meaning, that adds to the complexity of the task.
However, we have shown that fuzzy logic provides an
interesting and efficient approach to the problem of
analysing these types of corpora.
In its current version, FLE has been designed to
treat only text written in Spanish. However, the proto-
col is modular and simple to adapt to languages other
than Spanish. In particular, it should be possible to
adapt our FLE implementation to other romance lan-
guages, such as French, Portuguese, Italian, Catalan,
among others, with a reasonable amount of effort.
We plan to improve our model through the manip-
ulation of the linguistic values or by combining other
membership functions. We also consider the possibil-
ity of evaluating our protocol on artificial sentences,
generated by Natural Language Generation (NLG) al-
gorithms (Moreno-Jim
´
enez et al., 2020; Ke and Xiao-
jun, 2018).
ACKNOWLEDGEMENTS
This work is funded by Consejo Nacional de
Ciencia y Tecnolog
´
ıa (Conacyt, Mexico), grant
number 661101 and partially by the Univer-
sit
´
e d’Avignon/Laboratoire Informatique d’Avignon
(LIA), France.
REFERENCES
Arguedas, M., Xhafa, F., Casillas, L., Daradoumis, T., Pe
˜
na,
A., and Caball
´
e, S. (2018). A model for providing
emotion awareness and feedback using fuzzy logic in
online learning. In Soft Comput, volume 22, pages
963–977.
Cambria, E. (2016). Affective computing and sentiment
analysis. IEEE Intelligent Systems, 31(2):102–107.
Clark, E., Ji, Y., and Smith, N. A. (2018). Neural text gen-
eration in stories using entity representations as con-
text. In NAACL-HLT, volume 1, pages 2250–2260,
New Orleans, Louisiana. ACL.
Dragoni, M., Tettamanzi, A. G. B., and da Costa Pereira,
C. (2015). Propagating and aggregating fuzzy polar-
ities for concept-level sentiment analysis. Cognitive
Computation, 7:186–197.
Howells, K. and Ertugan, A. (2017). Applying fuzzy logic
for sentiment analysis of social media network data in
marketing. In 9th International Conference on Theory
and Application of Soft Computing, Computing with
Words and Perception, volume 120, pages 664–670.
Elsevier.
Indhuja, K. and Reghu, R. P. C. (2014). Fuzzy logic based
dentiment snalysis of product review documents. In
1st ICCSC, pages 18–22, Trivandrum, India. IEEE.
Iria, d. C., Cabr
´
e, M. T., SanJuan, E., Sierra, G., Torres-
Moreno, J., and Vivaldi, J. (2011). Automatic special-
ized vs. non-specialized sentence differentiation. In
CICLing 2011, pages 266–276, Tokyo, Japan.
Jones, K. S. (1972). A statistical interpretation of term
specificity and its application in retrieval. Journal of
Documentation, 28:11–21.
Ke, W. and Xiaojun, W. (2018). SentiGAN: Generating
sentimental texts via mixture adversarial networks.
In Proceedings of the Twenty-Seventh International
Joint Conference on Artificial Intelligence (IJCAI-18),
pages 4446–4452, Stockholm, Sweden. IJCAI-ECAI,
AAAI Press.
Matiko, J. W., Beeby, S. P., and Tudor, J. (2014). Fuzzy
logic based emotion classification. In 2014 IEEE In-
ternational Conference on Acoustics, Speech and Sig-
nal Processing (ICASSP), pages 4389–4393. IEEE.
Moreno-Jim
´
enez, L. G., Torres-Moreno, J. M., Wedemann,
R. S., and SanJuan, E. (2020). Generaci
´
on autom
´
atica
de frases literarias. Linguam
´
atica, 12(1):15–30.
Padr
´
o, L. and Stanilovsky, E. (2012). FreeLing 3.0: To-
wards wider multilinguality. In 8th International
Conference on Language Resources and Evaluation
(LREC 2012), pages 2473–2479, Istanbul, Turkey.
ELRA.
Pang, B. and Lee, L. (2008). Opinion mining and sentiment
analysis. Foundations and Trends
R
in Information
Retrieval, 2(1–2):1–135.
Siddiqui, M., Wedemann, R. S., and Jensen, H. J. (2018).
Avalanches and generalized memory associativity in a
network model for conscious and unconscious mental
functioning. Physica A, 490:127–138.
Tashtoush, Y. M. and Al Aziz Orabi, D. A. (2019). Tweets
emotion prediction by using fuzzy logic system. In 6th
International Conference on Social Networks Analy-
sis, Management and Security (SNAMS), pages 83–90.
IEEE.
Torres-Moreno, J.-M. (2012). Beyond stemming and
lemmatization: Ultra-stemming to improve automatic
text summarization. CoRR, abs/1209.3126.
Torres-Moreno, J.-M. and Moreno-Jim
´
enez, L.-G. (2020).
LiSSS: A toy corpus of literary Spanish sentences sen-
timent for emotions detection. arXiv, 2005.08223v1.
Vashishtha, S. and Susan, S. (2020). Inferring sentiments
from supervised classification of text and speech
cues using fuzzy rules. Procedia Computer Science,
167:1370 – 1379.
Wedemann, R. S. and Plastino, A. R. (2016). F
´
ısica es-
tad
´
ıstica, redes neuronales y Freud. Revista N
´
ucleos,
3:4–10.
Zadeh, L. (1965). Fuzzy sets. Information and Control,
8(3):338 – 353.
FLE: A Fuzzy Logic Algorithm for Classification of Emotions in Literary Corpora
209