Authors:
Stefan Glüge
;
Ronald Böck
and
Andreas Wendemuth
Affiliation:
Otto von Guericke University Magdeburg, Germany
Keyword(s):
Segmented-memory recurrent neural networks, Emotion recognition from speech.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Biomedical Engineering
;
Biomedical Signal Processing
;
Computational Intelligence
;
Data Manipulation
;
Enterprise Information Systems
;
Health Engineering and Technology Applications
;
Human-Computer Interaction
;
Learning Paradigms and Algorithms
;
Methodologies and Methods
;
Neural Network Software and Applications
;
Neural Networks
;
Neurocomputing
;
Neurotechnology, Electronics and Informatics
;
Pattern Recognition
;
Physiological Computing Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Theory and Methods
Abstract:
Emotion recognition from speech means to determine the emotional state of a speaker from his or her voice. Today’s most used classifiers in this field are Hidden Markov Models (HMMs) and Support Vector Machines. Both architectures are not made to consider the full dynamic character of speech. However, HMMs are able to capture the temporal characteristics of speech on phoneme, word, or utterance level but fail to learn the dynamics of the input signal on short time scales (e.g., frame rate). The use of dynamical features (first and second derivatives of speech features) attenuates this problem. We propose the use of Segmented-Memory Recurrent Neural Networks to learn the full spectrum of speech dynamics.
Therefore, the dynamical features can be removed form the input data. The resulting neural network classifier is compared to HMMs that use the reduced feature set as well as to HMMs that work with the full set of features. The networks perform comparable to HMMs while using significan
tly less features.
(More)