Authors:
Aicha Nouisser
1
;
Nouha Khediri
2
;
Monji Kherallah
3
and
Faiza Charfi
3
Affiliations:
1
National School of Electronics and Telecommunications of Sfax, Tunisia
;
2
Faculty of Computing and Information Technology, Northern Border University, Rafha, K.S.A.
;
3
Faculty of Sciences of Sfax, University of Sfax, Tunisia
Keyword(s):
Sentiment Analysis, Bimodality, Transformer, BERT Model, Audio and Text, CNN.
Abstract:
The diversity of human expressions and the complexity of emotions are specific challenges related to sentiment analysis from text and speech data. Models must consider not only text but also nuances of intonation and emotions expressed by voice. To address these challenges, we created a bimodal sentiment analysis model named ATFSC, that organizes emotions based on textual and audio information. It fuses textual and audio information from conversations, providing a more robust analysis of sentiments, whether negative, neutral, or positive. Key features include the use of transfer learning with a pre-trained BERT model for text processing, a CNN-based audio feature extractor for audio processing, and flexible preprocessing capabilities that support different dataset formats. An attention mechanism was employed to perform a bimodal fusion of audio and text features, which led to a notable performance optimization. As a result, we observed a performance amelioration in the accuracy value
s such as 64.61%, 69%, 72%, 81.36% on different datasets respectively IEMOCAP, SLUE, MELD, and CMU-MOSI.
(More)