Authors:
Amany H. AbouEl-Naga
1
;
May Hussien
1
;
Wolfgang Minker
2
;
Mohammed Salem
3
and
Nada Sharaf
1
Affiliations:
1
Faculty of Informatics and Computer Science, German International University, New Capital, Egypt
;
2
Institute of Communications Engineering, Ulm University, Ulm, Germany
;
3
Faculty of Media Engineering and Technology, German University in Cairo, New Cairo, Egypt
Keyword(s):
Emotion Recognition, Multimodal Emotion Recognition, Dataset Generation, Artificial Intelligence, Affective Computing, Deep Learning, Machine Learning, Multimodality.
Abstract:
Human communication relies deeply on the emotional states of the individuals involved. The process of identifying and processing emotions in the human brain is inherently multimodal. With recent advancements in artificial intelligence and deep learning, fields like affective computing and human-computer interaction have witnessed tremendous progress. This has shifted the focus from unimodal emotion recognition systems to mul-timodal systems that comprehend and analyze emotions across multiple channels, such as facial expressions, speech, text, and physiological signals, to enhance emotion classification accuracy. Despite these advancements, the availability of datasets combining two or more modalities remains limited. Furthermore, very few datasets have been introduced for the Arabic language, despite its widespread use (Safwat et al., 2023; Akila et al., 2015). In this paper, MODALINK, an automated workflow to generate the first novel Egyptian-Arabic dialect dataset integrating visu
al, audio, and text modalities is proposed. Preliminary testing phases of the proposed workflow demonstrate its ability to generate synchronized modalities efficiently and in a timely manner.
(More)