Multimodal Sentiment Analysis on Video Streams using Lightweight Deep Neural Networks

Atitaya Yakaew, Matthew Dailey, Teeradaj Racharak

Abstract

Real-time sentiment analysis on video streams involves classifying a subject’s emotional expressions over time based on visual and/or audio information in the data stream. Sentiment can be analyzed using various modalities such as speech, mouth motion, and facial expression. This paper proposes a deep learning approach based on multiple modalities in which extracted features of an audiovisual data stream are fused in real time for sentiment classification. The proposed system comprises four small deep neural network models that analyze visual features and audio features concurrently. We fuse the visual and audio sentiment features into a single stream and accumulate evidence over time using an exponentially-weighted moving average to make a final prediction. Our work provides a promising solution to the problem of building real-time sentiment analysis systems that have constrained software or hardware capabilities. Experiments on the Ryerson audio-video database of emotional speech (RAVDESS) show that deep audiovisual feature fusion yields substantial improvements over analysis of either single modality. We obtain an accuracy of 90.74%, which is better than baselines of 11.11% – 31.48% on a challenging test dataset.

Download


Paper Citation


in Harvard Style

Yakaew A., Dailey M. and Racharak T. (2021). Multimodal Sentiment Analysis on Video Streams using Lightweight Deep Neural Networks.In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-486-2, pages 442-451. DOI: 10.5220/0010304404420451


in Bibtex Style

@conference{icpram21,
author={Atitaya Yakaew and Matthew Dailey and Teeradaj Racharak},
title={Multimodal Sentiment Analysis on Video Streams using Lightweight Deep Neural Networks},
booktitle={Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2021},
pages={442-451},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010304404420451},
isbn={978-989-758-486-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Multimodal Sentiment Analysis on Video Streams using Lightweight Deep Neural Networks
SN - 978-989-758-486-2
AU - Yakaew A.
AU - Dailey M.
AU - Racharak T.
PY - 2021
SP - 442
EP - 451
DO - 10.5220/0010304404420451