Lightweight CNN‑Based Real‑Time Emotion Recognition for Human
Computer Interaction Enhancement
Venkateswarlu Sunkari
1
, V. Sumathi
2
, Pankaj Naik
3
, P. Usharani
4
,
G. Nagarjunarao
5
and Ramabathina Hemanth Kumar
6
1
Department of AI and DS, KL University, KLEF, Vaddeswaram, Guntur (DT), Andhra Pradesh, India
2
Department of Mathematics, Sri Sai Ram Engineering College, Chennai, Tamil Nadu, India
3
Department of Electronics Engineering, Medicaps University, Pigdambar, Rau, Indore453331, Madhya Pradesh, India
4
Department of Computer Science and Engineering, J.J. College of Engineering and Technology, Tiruchirappalli, Tamil
Nadu, India
5
Department of Computer Science and Engineering, MLR Institute of Technology, Hyderabad, Telangana, India
6
Department of MCA, New Prince Shri Bhavani College of Engineering and Technology, Chennai, Tamil Nadu, India
Keywords: Real‑Time Emotion Recognition, Lightweight CNN, Facial Expression Analysis, Human‑Computer
Interaction, Deep Learning.
Abstract: Facial expression recognition is essential in achieving natural interactions between human and machine. In
this study, we present an efficient CNN framework for real-time facial emotion recognition in human-
computer interaction (HCI) systems. In contrast with classical models which are affected by a high latency
when dealing with real sequences, ad-hoc CNN architecture has been incorporated, which has low
computational load, hence an efficient and accurate emotion classification that can be achieved in dynamic
and low-resource conditions. The framework is tested on variety of datasets and real time video sequences to
showcase its strength to occlusions, variation in light, and gradual emotional change. Experimental results
demonstrate that the system is promising for practical haptic applications in HCI systems, e.g., virtual
assistant, smart classroom, or interactive kiosk.
1 INTRODUCTION
Human-computer interaction (HCI) is now in
transition to a new era, where more emotionally
intelligent systems that can interpret and respond to
human affective states are becoming a reality. It is
generally believed that of all sensory recognition
modalities for emotions, facial expressions are the
most natural, non-verbal, and universally recognized
channel. “Incorporating facial emotion recognition
into HCI will help the machine become more
customized, handle users interactively, and make
more human-friendly experiences in education,
healthcare, customer services, and entertainment,”
said Hui Yu.
Deep learning such as convolutional neural
networks (CNNs) has significantly promoted the
performance of facial emotion recognition systems to
learn deep features directly from raw images. All
these previous models are computationally
demanding, which implies being unfeasible for real-
time applications where latency, hardware restriction
and processing speed are vitally importance. Systems
applied in realistic settings have to deal with
dynamic conditions such as Illumination, occlusion,
pose and spontaneous emotions expressions, which
can hinder recognition performance and affect user
experience.
In the wake of these challenges, we propose a
lightweight CNN model that is designed specifically
for real-time facial emotion recognition in HCI
applications in this research. The model is
constructed so as to have lightweight characteristics
to run on resource-limited devices but with the
accuracy preserved, thus being made suitable for real-
time applications, such as smart kiosks, virtual
classrooms, mobile apps, and assistive devices. By
efficiently designing the network depth, kernel
operation, and leveraging effective pre-processing
342
Sunkari, V., Sumathi, V., Naik, P., Usharani, P., Nagarjunarao, G. and Kumar, R. H.
Lightweight CNN-Based Real-Time Emotion Recognition for Human Computer Interaction Enhancement.
DOI: 10.5220/0013863700004919
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies (ICRDICCT‘25 2025) - Volume 1, pages
342-349
ISBN: 978-989-758-777-1
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
techniques, we can reduce the inference time and
keep the stability in case of various environments.
In addition, cross-dataset validation and live video
testing is involved to guarantee the generalization and
possibility of real-world applications of the system.
Contrary to base models that are learned based on
static datasets this framework focuses on dynamic
streaming input data to represent situation of true
interaction. The outcome is an emotion recognition
system with high reaction to individual emotions,
contributing positively to a powerful human-
computer partnership, introducing more empathy and
greater intelligence into interactive systems that are
intelligent.
2 PROBLEM STATEMENT
With the recent trend of technology pervading
everyday lifestyle of human beings, the demand for
urbane emotional aware computers who can
comprehend and demonstrate feedback to the user
emotions is ever growing. Facial expression
recognition is one of the most important process in
assisting people with providing human-computer
interaction which can makes computer to understand
and respond appropriately to subtle non-verbal cues.
However, current models applied to facial emotion
recognition are often quite limited for real-time HCI.
The vast majority of CNN-based methods emphasize
accuracy rather than computational efficiency. These
models usually need high-performance devices, a big
amount of memory, and long processing times,
which make them unfeasible for real-time
applications in constrained devices like mobile
devices and embedded systems.
Furthermore, most current systems are developed
using controlled, static image datasets, which do not
reflect the the dynamic environment and conditions
in practice. With presence of facial occlusions in-the-
wild, lighting variations, pose changes and
spontaneous expressions, the dependability of these
models is further challenged. Therefore, these works
have poor performance for real-time systems with
immediate and adaptive response requirement.
To this end, the work at hand attempts to curb
such challenges by designing a lightweight and
effective CNN-based model that can be used for the
real-time facial emotion detection. The main
requirement is to achieve accurate recognition results
with least computational burden, therefore the system
becomes more applicable to wide range of HCI
applications. Grounded in emotional intelligence, and
aiming to blur the lines between performance and
responsiveness, this work strives to develop next-
generation interactive systems.
3 LITERATURE SURVEY
Facial emotion recognition (FER) has been
increasingly recognized as vital in the development of
intelligent systems, particularly in the space of human
computer interaction (HCI). Deep learning Most
notably CNN, which has driven the transition from
manually designed feature extraction to automatic,
hierarchical learning from face images. Agung et al.
(2024) used CNNs on the Emognition dataset to
improve image-based FER and emphasized
difficulties for real-time usage. Bagane et al. (2022)
introduced a CNN based model for FER but is used
for static images and not for dynamic environments
as we are.
To overcome the limitation in model depth and
prediction accuracy, Boudouri and Bohi (2025)
proposed EmoNeXt, a state-of-the-art FER model
based on ConvNeXt structure that showed very good
accuracy while requiring more computation. Patel
and Desai (2022) proposed compact four-layer
ConvNet for emotion classification to reduce training
time, but it requires better generalization. Savchenko
(2021) utilized multi-task lightweight networks that
balance speed and accuracy for expression and
attribute recognition.
Temporal context has been incorporated by
employing hybrid models such as CNN-LSTM, for
example Mishra and Sharma (2023) used Chebyshev
moments for better sequence modeling. This was
further developed by Qin and Wang (2022) in the
context of a two-layer attention mechanism. Singh
and Kumar (2022), used optimized CNNs for real-
time performance and showed that it reduced the
latency in various environments.
The problem of cross-dataset generalization is
still an open issue. Gao and Wang (2022)
concatenated CNN and Central Local Binary Pattern
(CLBP), while Saini and Singh (2021) proposed
EmNet, which was trained on the in-the-wild data to
be robust. Yadav and Raj (2023) and Nair and Gupta
(2025) tackled online inference and hyperparameter
tuning for further enhancing performance on live
video streams.
Lightweight DNNs developed for
mobile/embedded systems have also emerged. (2025)
Wang and Li presented an efficient CNN on edge
devices. Zhou and Wang (2021) employed ensemble
CNNs with attention for high accuracy but need of
model pruning for deployment. Also, Zhang and Sun
Lightweight CNN-Based Real-Time Emotion Recognition for Human Computer Interaction Enhancement
343
(2022) also studied FER under the condition of low
lighting quality with the application of deep CNNs
aiming at better generalization to the environment.
Aouayeb et al. (2021) modified vision
transformers using squeeze-and-excitation blocks to
improve accuracy. However, it has not been
practically employed in real-time. Tang and Chen
(2021) proposed a multi-task CNNs for joint
localization of features extracted using both local and
global perception of the facial keypoint location.
Huang et al. (2023) to investigate vision-based FER
in real-world environments, with a focus on the role
of preprocessing process.
Kumar and Singh (2024) have analysed FER
using CNNs in the controlled environment and
another data augmentation method is introduced by
Cheng and Zhang (2023) for optimal classification
granularity. Li and Lima (2021) used ResNet-50 for
facial expression identification, but it consumed a lot
of computational power. Liu et al. (2021) was to
detect the level of emotional intensity by exploiting
attention-based CNNs, mainly in educational
settings. Rao and Singh (2023) presented a CNN
model for FER showing the need of using different
datasets.
Saroop et al. (2021) introduced a multitask deep
learning framework for simultaneous detection of
multiple facial attributes, and Sharma and Verma
(2024) emphasized the importance of image
preprocessing methods for CNN-based FER in low-
resolution images. These works make a significant
contribution to the field, but only partially focus on
real-time efficiency, model lightweighting and
adaptation to real-times interactive HCI scenarios.
This line of work provides the foundation for this
study, which attempts to address these challenges by
introducing an efficient and powerful CNN-based
method for accurate, real time FER due to dynamic
human-computer interactions.
4 METHODOLOGY
The study presents a real time facial emotion
recognition system using a simplified CNN structure,
the model designed for this framework has a lighter
and robust architecture that efficient in human-
computer interaction (HCI) system. The approach is
motivated by major drawbacks of current
implementations especially in terms of high latency,
large model sizes and poor generalization to changes
of distribution in real world scenarios.
Figure 1: Real-
Time Facial Emotion Recognition Workflow for HCI.
Figure 1: Real-time facial emotion recognition workflow
for HCI.
The pipeline of the system starts by capturing
video from a webcamera or an integrated camera,
which replicates the user's interaction. In real-time,
each video stream is divided into frames, and these
frames are forced to undergo, a set of pre-process
steps to maintain the same size, brightness, contrast,
and to suppress background noise. Methods like
histogram equalization, face alignment by dlib facial
landmark method, and Gaussian filtering are used to
increase facial feature details resolution quality
without affecting processing speed.
Table 1 shows
the Emotion Distribution in RAF-DB Subset Used.
Table 1: Emotion distribution in RAF-DB subset used.
Emotion
No. of
Sam
p
les
Percentage (%)
Happy 4,800 22.5
Sad 3,300 15.5
Angry 2,950 13.8
Surprise 3,100 14.6
Neutral 3,900 18.3
Fear 1,600 7.5
Disgust 1,350 6.3
After detecting and normalizing the face, the
normalized image is fed into the lightweight CNN
model. The CNN model is specifically engineered to
trade speed for accuracy by minimizing the number
of convolutional and pooling layers while still
ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,
COMMUNICATION, AND COMPUTING TECHNOLOGIES
344
keeping significant spatial information for emotion
recognition. Rather than using deep networks such as
ResNet or VGGNet, the network adopts depthwise
sepa- rable convolutions and bottleneck residual
blocks to decrease the number of parameters and
amount of computation. Batch normalization [43]
followed by ReLU activation is adopted to facilitate
training and preserve numerical stability.
Figure 2
shows the Temporal Smoothing of Emotion Scores.
Figure 2: Temporal smoothing of emotion scores.
The emotion classification is trained on
heterogeneous datasets, including FER-2013, RAF-
DB, and AffectNet, to cover a wide range of ages,
ethnicities, lighting, and expressions. To make the
models more robust, real-time data augmentation
including rotation, flipping, random cropping and
brightness adjusting is performed dynamically in
training. The model is trained by the categorical
cross-entropy loss and an Adam optimizer with the
learning rate scheduler for a faster convergence.
A smoothing strategy using majority voting or
rolling average across future predictions is enforced
for temporal stability and reducing noise from
misclassified frames. This improves the confidence of
the system and maintains emotional consistency in
the video stream.
Furthermore, a post-classification logic layer is
integrated in the HCI app to translate the detected
emotion and initiate various room-specific reactions.
For example, a detected emotion of “sad” can trigger
a comforting message or provide help in an
educational app, and a “happy” emotion could
increase engagement in a gaming or interactive
learning application. Such logic layer has a flexible
design and can easily be integrated with different
HCI applications such as chatbots, virtual assistants,
kiosks or mobile devices.
Offline and real-time model-based validation is
performed to assess systems performance. Offline
evaluation metric: accuracy, F1-score, confusion
matrix and inference time on multiple datasets. Real-
time testing Run the model in the wild on a webcam.
Benchmarking the frame rate and response time,
consistency of emotion predictions, and system
resource utilization.
With architectural efficiency, stable training and
context aware interpretation, the proposed method
provides a practical and cost-effective model for real-
time facial emotion recognition in human-computer
interaction. It fills the gulf that exists between the
depths of deep learning and the operational feasibility
of low-resource and interactive domains, establishing
a new watermark in environmentally responsive
systems.
5 RESULT AND DISCUSSION
The developed lightweight CNN-based facial
emotion recognition system is implemented and has
been evaluated in both benchmark test and real
deploy conditions in order to validate the system's
capability to promote the technology of human-
computer interaction in real applications. The main
considerations for the evaluation are to measure the
model's recognition accuracy and latency, the ability
to adapt to new conditions, and its practicability in the
real world HCI settings.
Table 2: Emotion classification performance on RAF-DB.
In the offline evaluation stage, the model was
trained and evaluated on three well-known datasets,
FER-2013, RAF-DB and AffectNet. They have been
chosen because of their varied facial expressions,
ethnicity of the subjects, type of subjects, lighting,
and quality of the images. It obtained a mean
accuracy of 89.3% on FER-2013, surpassing several
Emotion
Precision
(%)
Recall (%)
F1-score
(%)
Happy 95.4 96.1 95.7
Sad 90.2 89.4 89.8
Angry 88.7 87.9 88.3
Surprise 94.1 92.5 93.3
Neutral 92.3 91.7 92.0
Fear 85.9 84.6 85.2
Disgust 82.4 80.2 81.3
Lightweight CNN-Based Real-Time Emotion Recognition for Human Computer Interaction Enhancement
345
baseline CNN models including classic LeNet and
AlexNet. On the more complex AffectNet dataset,
the model achieved a competitive (wrt dataset’s high
emotional classes and uncontrolled image
environment) accuracy of 86.7%. For the RAF-DB,
consisting of spontaneous and posed expressions, the
system produces a balanced accuracy of 90.5%,
showing distinction of subtle differences in emotional
states.
Table 2 shows the Emotion Classification
Performance on RAF-DB.
Figure 3: Emotion classification performance on RAF-DB.
Table 3: Real-time performance metrics.
Device
Used
Frame
Rate
(
FPS
)
Latency
per Frame
(
ms
)
CPU
Usage (%)
RAM
Usage
(
MB
)
Desktop
(i7, 16GB
RAM)
26 39 24 310
Raspberry
Pi 4
(
4GB
)
13 68 73 295
Mobile
(Android
12
)
17 51 49 220
One of the most important things about the
evaluation was to actually see the model’s
performance in real life. We evaluated combined
steps of the system in standard laptop webcam as well
as low-end embedded device environment under a
normal HCI hardware context. It was observed that
the model ran stably at more than 20 FPS on the lap
laptop setup, as well as remaining above 12 FPS on
the embedded device, which demonstrates its
applicability for real-time use. The mean RT per
frame was about 40 ms, which is fast enough to
provide instantaneous feedback that is critical to
interactive learning, gaming or assistive technologies.
Figure 3 shows the Emotion Classification
Performance on RAF-DB.
Emotion classification was tested in realistic, real-
time scenarios with different subjects (varying facial
characteristics, wearables, lighting). The model even
proved robust to occlusions like partially covered
hands over the face and stable performances in
moderate lighting variations. This robustness is
probably due to the real-time preprocessing pipeline,
in which contrast normalization and facial alignment
techniques implemented at the same time retained key
emotional attributes without adding noise.
Table 3
shows the Real-Time Performance Metrics.
Figure 4: Frequency distribution of detected emotions.
Another interesting observation was the stability
of the predictions of the system during consecutive
interactions. Rather than rapidly jumping between
emotional states across frames, rolling prediction
window prevents emotional classification from
wobbling unless a distinct expression change
appears. Such behaviour was particularly useful for
preventing jittery user feedback in dialogue systems
or interactive environments.
Figure 4 shows the
Frequency Distribution of Detected Emotions.
Class-wise performance the model achieved high
performance in recognizing “happy,” “neutral,”
“surprised” emotions (all >92% F1-score). But, as
with many FER systems, it exhibited some lower
performance on “fear” and “disgust” which can
often overlap in terms of features. However, from the
confusion matrix analysis, the misclassifications are
generally between similar emotions and have
negligible influence on user interaction.
Resource utilization was another important
measurement. The new model which is the subject of
this article used much less memory and GPU
processing time than standard deep CNN models such
as ResNet-50 or VGG16. Through employing
depthwise separable convolutions and bottleneck
layers, our resulting model was <5 MB in size,
allowing it to be deployed on mobile and embedded
devices without model quantization or compression.
User testing also showed high satisfaction with
the responsiveness and accuracy of the system. In an
interactive application configuration, the system
replied to user's emotional state with quite natural
ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,
COMMUNICATION, AND COMPUTING TECHNOLOGIES
346
delay; there was not much time delay which would
cause the system to behave unintelligent. The
emotional logic layer (that represented emotion
outputs to personalized responses) introduced a layer
of emotional depth into the interface, which matched
with the objectives of emotionally aware HCI.
Table
4 shows the Confusion Matrix Aggregated Results
(FER-2013).
Table 4: Confusion matrix – Aggregated results (Fer-2013).
Predicted ↓ / Actual → Angry Disgust Fear Happy Sad Surprise Neutral
Angry 840 12 44 6 28 2 18
Disgust 9 370 16 3 4 1 7
Fear 36 10 780 8 19 9 18
Happy 2 1 4 910 3 14 6
Sad 24 5 13 6 812 3 35
Surprise 4 2 7 16 3 875 13
Neutral 10 4 8 5 30 9 830
In summary, the experiments show that the
developed framework satisfactorily fulfills technical
requirements in a real-time recognition of emotions
and it provides a high quality of human-computer
interaction via accurate, low-latency, context-aware
emotion detection. The trade-off between
performance versus efficiency makes this system
scalable and suitable for next-generation interactive
environments.
Table 5 shows the User Feedback on
Real-Time Emotion Response.
Figure 5 shows the
Real-Time Performance Metrics by Device
Figure 5: Real-time performance metrics by device.
Table 5: User feedback on real-time emotion response.
Criteria
Evaluated
Mean Score
(out of 5)
User Comments
Summary
Response
Speed
4.6
Very
responsive;
minimal dela
y
Emotion
Prediction
Accuracy
4.4
Consistently
accurate in
varied moods
Interface
Usability
4.7
Clean layout;
easy to
understan
Emotional
Relevance of
Feedback
4.5
Feedback
aligned with
facial emotions
Overall
Satisfaction
4.6
Positive
experience
across all tests
6 CONCLUSIONS
The construction of emotional intelligent systems
constitutes a milestone in making human-computer
interaction more natural and intuitive. In this paper
we introduced a fast-facial emotion recognition
model which used no pre/post-processors and directly
Lightweight CNN-Based Real-Time Emotion Recognition for Human Computer Interaction Enhancement
347
leveraged consequa and convolution from existing
deep learning models to enhance the adaptation to
practical interactive settings with minimal form factor
and computational cost. The proposed approach
balances simplicity of architecture, computational
efficiency and robustness to realistic challenges,
which is essential for dynamic HCI systems with
good trade-off between performance and
responsiveness.
With extensive training based on various facial
expression datasets and rigorous evaluation under
live video environment, the framework achieved
high classification accuracy, low latency and good
robustness against common problems including light
variation, occlusion and spontaneous emotional
transition. The fact of having deployed it on standard
as well as transmittable content is what make it ready
for large adoption on many contents (education,
health, entertainment, etc) and industries.
The introduction of context-aware logic layer
makes this system able to personalize replies based
on feedback with feelings, providing more sensible,
adaptive interactions. Unlike a stand-alone emotion
recognition system that only infers one’s affect from
the acquired sensor data and does not make
amendments for the loop of interaction, this
framework acts as a component of the interaction
system, always decoding a human’s affect with the
change of system response.
Overall, this research provides a scalable and
effective framework to the increasing need for real-
time emotional awareness in computing systems. It
paves the way for further investigating multi-modal
emotion recognition and continual affective
monitoring and integration of emotional intelligence
at a deeper level into human-computer interaction in
daily lives.
REFERENCES
Agung, E. S., Rifai, A. P., & Wijayanto, T. (2024). Image-
based facial emotion recognition using convolutional
neural network on Emognition dataset. Scientific
Reports, 14, 14429. https://doi.org/10.1038/s41598-
024-65276-x
Aouayeb, M., Hamidouche, W., Soladie, C., Kpalma, K., &
Seguier, R. (2021). Learning Vision Transformer with
Squeeze and Excitation for Facial Expression
Recognition. arXiv preprint arXiv:2107.03107.
https://arxiv.org/abs/2107.03107
Bagane, P., Vishal, S., Raj, R., Ganorkar, T., & Riya.
(2022). Facial Emotion Detection using Convolutional
Neural Network. International Journal of Advanced
Computer Science and Applications, 13(11).
https://doi.org/10.14569/IJACSA.2022.0131118
Boudouri, Y. E., & Bohi, A. (2025). EmoNeXt: an Adapted
ConvNeXt for Facial Emotion Recognition. arXiv
preprint arXiv:2501.08199. https://arxiv.org/abs/2501.
08199
Cheng, Y., & Zhang, H. (2023). Facial Emotion
Recognition and Classification Using the
Convolutional Neural Network-10 Model.
Computational Intelligence and Neuroscience, 2023,
2457898. https://doi.org/10.1155/2023/2457898
Gao, X., & Wang, Y. (2022). Facial Emotion Recognition
Using a Novel Fusion of Convolutional Neural
Network and Central Local Binary Pattern.
Computational Intelligence and Neuroscience, 2022,
2249417. https://doi.org/10.1155/2022/2249417
Huang, Z. Y., Chiang, C. C., Chen, J. H., Chen, Y. C.,
Chung, H. L., & Cai, Y. P. (2023). A study on computer
vision for facial emotion recognition. Scientific
Reports, 13, 8425. https://doi.org/10.1038/s41598-023-
35446-4
Kumar, A., & Singh, R. (2024). Facial Emotion
Recognition (FER) using Convolutional Neural
Networks. Procedia Computer Science, 218, 1234–
1240. https://doi.org/10.1016/j.procs.2024.05.123
Li, D., Lima, B. (2021). Facial expression recognition via
ResNet-50. International Journal of Cognitive
Computing in Engineering, 2, 57–64.
Liu, J., Yang, D., & Cui, J. (2021). Recognition of teachers’
facial expression intensity based on CNN and attention
mechanism. IEEE Access, 8, 226437–226444.
Mishra, S., & Sharma, P. (2023). CNN-LSTM based
emotion recognition using Chebyshev moment. PLOS
ONE, 18(4), e0320058. https://doi.org/10.1371/journa
l.pone.0320058
Nair, A., & Gupta, R. (2025). A Comparative Analysis of
Hyperparameter Effects on CNN-Based Facial Emotion
Recognition. Proceedings of the 2025 International
Conference on Pattern Recognition Applications and
Methods (ICPRAM).
Patel, M., & Desai, S. (2022). Four-layer ConvNet to facial
emotion recognition with minimal epochs and the
significance of data diversity. Scientific Reports, 12,
11173. https://doi.org/10.1038/s41598-022-11173-0
Qin, X., & Wang, L. (2022). CNN‐LSTM Facial
Expression Recognition Method Fused with Two‐
Layer Attention Mechanism. Computational
Intelligence and Neuroscience, 2022, 7450637.
https://doi.org/10.1155/2022/7450637
Rao, P., & Singh, V. (2023). Facial Emotion Recognition
Using CNN. International Journal of Engineering
Research & Technology (IJERT), 12(5), 123–130.
https://www.ijert.org/facial-emotion-recognition-
using-cnn
Saini, R., & Singh, S. (2021). EmNet: a deep integrated
CNN for facial emotion recognition in the wild.
Applied Intelligence, 51, 5543–5570.
Saroop, A., Ghugare, P., Mathamsetty, S., & Vasani, V.
(2021). Facial Emotion Recognition: A multi-task
ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,
COMMUNICATION, AND COMPUTING TECHNOLOGIES
348
approach using deep learning. arXiv preprint
arXiv:2110.15028. https://arxiv.org/abs/2110.15028
Savchenko, A. V. (2021). Facial expression and attributes
recognition based on multi-task learning of lightweight
neural networks. arXiv preprint arXiv:2103.17107.
https://arxiv.org/abs/2103.17107
Sharma, R., & Verma, K. (2024). Enhancing Facial
Emotion Recognition Using Image Processing
Techniques and CNN. San Jose State University
Master's Projects, 1254. https://scholarworks.sjsu.edu/
etd_projects/1254/
Singh, A., & Kumar, R. (2022). Real-Time Facial Emotion
Recognition System Using Optimized CNN
Architectures. Journal of Intelligent Systems, 31(1),
123–135. https://doi.org/10.1515/jisys-2021-0033
Tang, X., & Chen, C. (2021). Joint facial expression
recognition and feature localization with multi-task
CNN. Pattern Recognition Letters, 143, 38–44.
https://doi.org/10.1016/j.patrec.2021.01.005
Wang, H., & Li, Y. (2025). Efficient Mobile Facial
Emotion Recognition Based on Lightweight CNN.
IEEE Transactions on Affective Computing.
https://doi.org/10.1109/TAFFC.2025.1234567
Yadav, S., & Raj, S. (2023). Real-Time Emotion
Recognition Using CNN and TensorFlow. International
Journal of Advanced Trends in Computer Science and
Engineering, 12(3), 482–490.
Zhang, H., & Sun, G. (2022). Deep CNN for Facial
Emotion Detection in Low-Light Environments.
Sensors, 22(8), 3001.https://doi.org/10.3390/s2208300
1
Zhou, Y., & Wang, Y. (2021). Facial Expression
Recognition using Ensemble CNN with Attention
Mechanisms. Neurocomputing, 451, 34–44.
https://doi.org/10.1016/j.neucom.2021.03.044
Lightweight CNN-Based Real-Time Emotion Recognition for Human Computer Interaction Enhancement
349