Authors:
Giovanni Costantini
;
Valerio Cesarini
and
Daniele Casali
Affiliation:
Department of Electronic Engineering, University of Rome Tor Vergata, Italy
Keyword(s):
Emotions, Speech, Machine Learning, Arousal, Valence, Categorical, Dimensional.
Abstract:
In this paper, a selection of acoustic features, derived from literature and experiments, is presented for emotion recognition. Additionally, a new speech dataset is built by recording the free speech of six subjects in a retirement home, as part of a pilot project for the care of the elder called E-Linus. The dataset is employed along with another widely used set (Emovo) for testing the effectiveness of the selected features in automatic emotion recognition. Thus, two different machine learning algorithms, namely a multi-class SVM and Naïve Bayes, are used. Due to the unbalanced and preliminary nature of the retirement home dataset, a statistical method based on logical variables is also employed on it. The 24 features prove their effectiveness by yielding sufficient accuracy results for the machine learning-based approach on the Emovo dataset. On the other hand, the proposed statistical method is the only one yielding sufficient accuracy and no noticeable bias when testing on the m
ore unbalanced retirement home dataset.
(More)