is verified through typical scenarios such as smart
assistants, VR/AR, and smart driving. The study also
pays special attention to the core challenges currently
faced, such as data synchronization, modality loss,
and noise interference.
Research has found that multimodal emotion
computing significantly improves the accuracy and
robustness of emotion recognition by integrating data
such as speech, facial expressions, and physiological
signals. Deep learning models have shown great
potential in feature extraction and cross-modal
association modeling. However, existing
technologies still have obvious limitations.
Future research on multimodal affective
computing should focus on the following directions:
building richer multimodal datasets; developing
lightweight fusion algorithms; exploring cross-modal
self-supervised learning; and promoting cross-
innovation between affective computing and cutting-
edge technologies. With breakthroughs in key
technologies, multimodal affective computing is
expected to be deeply applied in areas such as smart
healthcare and emotional AI assistants. This study
provides a systematic reference for related fields, but
the technology is still in a rapid development stage
and requires continuous collaborative innovation
between academia and industry.
REFERENCES
Bruna, J., Zaremba, W., Szlam, A., et al.: ‘Spectral
networks and locally connected networks on graphs.’
arxiv preprint arxiv:1312.6203, 2013
Chen, J.: ‘Research on multi-label classification of
unstructured medical text based on multi-channel
convolutional neural network.’ Southwest Jiaotong
University, 2020
Gao, J., Li, P., Chen, Z., et al.: ‘A survey on deep learning
for multimodal data fusion.’ Neural Computation, 2020,
32 (5): 829-864
Hao, X.: ‘Research on video description algorithm based on
deep learning sequence model.’ Beijing University of
Posts and Telecommunications, 2020
Huddar, M., Sannakki, S., Rajpurohit, V.: ‘A survey of
computational approaches and challenges in
multimodal sentiment analysis.’ International Journal
of Computer Sciences and Engineering, 2019, 7(1):
876-883
Krizhevsky, A., Sutskever, I., Hinton, G E.: ‘ImageNet
classification with deep convolutional neural
networks.’ Communications of the ACM, 2017, 60(6):
84-90
Li, D., Chai, B., Wang, Z., et al.: ‘EEG emotion recognition
based on 3-D feature representation and dilated fully
convolutional networks.’ IEEE Transactions on
Cognitive and Developmental Systems, 2021, 13(4):
885-897
Peng, X.: ‘Multi-modal affective computing: a
comprehensive survey.’ Journal of Hengyang Normal
University, 2018, 39(3): 31-36
Plutchik, R.: ‘The nature of emotions: human emotions
have deep evolutionary roots, a fact that may explain
their complexity and provide tools for clinical practice.’
American Scientist, 2001, 89(4): 344-350
Sandryhaila, A., Moura, J.: ‘Discrete signal processing on
graphs.’ IEEE transactions on signal processing, 2013,
61(7): 1644-1656
Shuman, D., Narang, S., Frossard, P., et al.: ‘The emerging
field of signal processing on graphs: Extending high-
dimensional data analysis to networks and other
irregular domains.’ IEEE signal processing magazine,
2013, 30(3): 83-98
Soleymani, M., Garcia, D., Jou, B., et al.: ‘A survey of
multimodal sentiment analysis.’ Image and Vision
Computing, 2017, 65: 3-14
Spielman, D.: ‘Spectral graph theory.’ Combinatorial
scientific computing, 2012, 18: 18
Tan, Y., Wang, J., Zhang C.: ‘A review of text classification
methods based on graph convolutional neural networks.’
Computer Science, 2022, 49(08): 205-216
Wu, C., Lin, J., Wei, W.: ‘Survey on audiovisual emotion
recognition: databases,features, and data fusion
strategies.’ APSIPA Transactions on Signal and
Information Processing, 2014, 3: e12
Wu, L., Oviatt S., Cohen, P.: ‘Multimodal integration— a
statistical view.’ IEEE Transactions on Multimedia,
1999, 1(4): 334-341
Zhang, C., Yang, Z., He, X., et al.: ‘Multimodal intelligence:
representation learning, information fusion, and
applications.’ IEEE Journal of Selected Topics in Signal
Processing, 2020, 14(3): 478-493
Zhao, J., Mao, X., Chen, L.: ‘Speech emotion recognition
using deep 1D & 2D CNN LSTM networks.’
Biomedical signal processing and control, 2019, 47:
312-323