Authors:
Bakir Hadžić
1
;
Julia Ohse
2
;
Mohamad Eyad Alkostantini
1
;
Nicolina Peperkorn
2
;
Akihiro Yorita
3
;
Thomas Weber
1
;
Naoyuki Kubota
4
;
Youssef Shiban
2
and
Matthias Rätsch
1
Affiliations:
1
Reutlingen University, Reutlingen, Germany
;
2
Private University of Applied Sciences Göttingen, Göttingen, Germany
;
3
Daiichi Institute of Technology, Kagoshima, Japan
;
4
Tokyo Metropolitan University, Tokyo, Japan
Keyword(s):
Speech Emotion Recognition, Artificial Intelligence, Mental Health, Depression, Emotional Dynamics.
Abstract:
The goal of this study was to utilize a state-of-the-art Speech Emotion Recognition (SER) model to explore the dynamics of basic emotions in semi-structured clinical interviews about depression. Segments of N = 217 interviews from the general population were evaluated using the emotion2vec+ large model and compared with the results of a depressive symptom questionnaire. A direct comparison of depressed and non-depressed subgroups revealed significant differences in the frequency of happy and sad emotions, with participants with higher depression scores exhibiting more sad and less happy emotions. A multiple linear regression model including the seven most predicted emotions plus the duration of the interview as predictors explained 23.7 % of variance in depression scores, with happiness, neutrality, and interview duration emerging as significant predictors. Higher depression scores were associated with lesser happiness and neutrality, as well as a longer interview duration. The study
demonstrates the potential of SER models in advancing research methodology by providing a novel, objective tool for exploring emotional dynamics in mental health assessment processes. The model’s capacity for depression screening was tested in a realistic sample from the general population, revealing the potential to supplement future screening systems with an objective emotion measurement.
(More)