network model is used for emotion feature extraction
of expression videos. Experimental results show that
compared with multimodal emotion recognition, the
multimodal recognition results of EEG, peripheral
physiological, and facial expression signals have also
been greatly improved. In the MAHNOB-HCI
database, the dimension sentiment labels were
divided into three categories for experimentation, and
the above results also exist. Both methods have good
generalization ability and robustness (Zhu, 2021).
2.3 Emotion Recognition based on
EEG Signals and Facial Images
The emotion recognition method based on EEG
signals and facial images can simultaneously
combine internal brain activity and external facial
expressions, with higher accuracy and real-time
response capability in emotion recognition. In
contrast, heart rate variability is influenced by
multiple physiological factors, with lower emotional
specificity and weaker information complementarity
with facial images. Therefore, the fusion of EEG and
facial images is more robust in complex environments
and can comprehensively reflect an individual's true
emotional state. The overall recognition effect is
better than the combination of facial images and heart
rate variability.
A multi-level spatiotemporal feature adaptive
integration and unique shared feature fusion model,
as well as a multi granularity attention and feature
distribution calibration model, are proposed to
address the problems in the field of multimodal
emotion recognition combining EEG signals and
facial images. The loss function is used to constrain
the similarity or difference between each feature,
ensuring the model's ability to capture the unique
emotional semantic information of each modality and
the shared emotional semantic information between
modalities. The multi-level spatiotemporal feature
adaptive integration and unique shared feature fusion
model and method were cross validated on the DEAP
and MAHNOB-HCI datasets, with values of
82.60%/79.99%, 83.09%/78.60%, and
67.50%/62.42% on the Valence, Arousal, and
Emotion indicators, respectively. The 5-fold cross
validation showed values of 98.21%/97.02%,
98.59%/97.36%, and 90.56%/88.77% on the
Valence, Arousal, and Emotion indicators,
respectively, achieving competitive results and
demonstrating the feasibility and effectiveness of the
proposed model. The multi granularity attention and
feature distribution calibration model was validated
across experiments on the DEAP and MAHNOB-
HCI datasets, with values of 82.56%/81.63%,
82.44%/88.81%, and 66.51%/65.28% for the
Valence, Arousal, and Emotion metrics, respectively.
The results of 5-fold cross validation were
97.48%/98.83%, 97.96%/99.26%, and
90.04%/91.89% for the Valence, Arousal, and
Emotion metrics, respectively, demonstrating the
feasibility and effectiveness of the proposed model
over other existing methods (Chen, 2024).
In response to the significant achievements in
current research on single modal emotion recognition,
it is difficult to improve the accuracy of single modal
emotion recognition. However, multimodal signal
emotion recognition has gradually attracted the
attention of researchers. In order to recognize
emotions based on EEG signals more quickly and
accurately, an emotion recognition algorithm
combining discrete wavelet transform (DWT) and
empirical mode decomposition (EMD) is proposed.
To solve the problem of the inability to improve the
accuracy of single modal emotion recognition, this
paper establishes a dual-mode database combining
EEG signals and facial microexpression signals, and
increases the emotional dimension to five dimensions
(excitement, happiness, neutrality, fear, sadness).
Two experimental paradigms are used to collect
signals from subjects, and the database contains 24
samples. Subject. The above algorithm was deployed
on a self built database, and the results showed that
the algorithm proposed in this paper achieved an
accuracy of 46.43% in the emotion recognition five
classification task. Simultaneously extracting facial
micro expressions and facial feature tracking
characteristics as features, combined with differential
entropy features of EEG signals for feature fusion,
achieved an accuracy of 52.26% in the five
classification tasks. Compared to single modal
features, the accuracy of five classifications
corresponding to multimodal features has increased
by more than 6%. In order to address the issues of
high computational complexity and long testing time
in neural networks, this paper uses FPGA for
hardware acceleration of neural networks. Similarly,
a convolutional neural network consisting of two
convolutional layers was trained on the publicly
available database SEED. Using 10% of the data in
the database as the test set, the trained network
achieved an accuracy of 78.88% in the test set, with
an EEG signal testing time of 0.35 seconds per
minute. At the same time, this article adopts an 8-bit
quantization method to quantify the parameters of the
trained network, and achieves an accuracy of 73.34%
on the test set through software simulation, with an
EEG testing time of 0.29 seconds per minute. In