offering a diverse toolkit for effective classification 
tasks. 
In the realm of emotional expression recognition, 
diverse distance measures were employed in the M. 
Murugappan’s study to effectively classify facial 
emotions, with the average accuracy of each measure 
being meticulously documented (Murugappan, 2020). 
Furthermore, experimental investigations delved into 
the impact of varying the k value, revealing that the 
selection of k value significantly influences the 
accuracy of emotion classification. The study 
underscored the pivotal role of k value in the KNN 
classifier, indicating that a lower k value correlates 
with a heightened emotion recognition rate. 
2.1.3  RF and PCA 
In the realm of image processing, the traditional 
Principal Component Analysis (PCA) serves as a 
widely employed technique for extracting expression 
features. Initially, this method involves converting 
the image matrix into a one-dimensional image vector. 
However, the application of the K-L transformation 
leads to a significant increase in the dimension of the 
image vector space, posing challenges in accurately 
computing the covariance matrix. To address this 
issue, the emergence of 2DPCA offers a direct 
approach to handling two-dimensional image 
matrices, thereby facilitating the computation of the 
covariance matrix with greater ease. Given its lower 
dimensionality, 2DPCA simplifies the calculation of 
eigenvalues and eigenvectors of the covariance 
matrix. 
The fundamental concept behind 2DPCA lies in 
treating each image as an undefined control sequence 
that undergoes linear transformation into an m-
dimensional column vector through matrix 
multiplication. Here, x denotes the n-dimensional 
projected column vector, while Y represents the 
mapping of an eigenvector of the matrix in the x-
direction. Following the completion of expression 
feature extraction, the selection of a suitable 
classification method becomes paramount. Ju Jia 
purposed the random forest classifier is adopted for 
its attributes of rapid classification speed, robustness 
(Jia, 2016), and high recognition rates in high-
dimensional scenarios. Random forest, constructed 
from multiple decision trees, proves effective in 
addressing multi-data classification challenges. 
The random forest classifier comprises N decision 
trees (e.g., T1, T2, ..., TN), where each decision tree 
functions as a voting classifier. The ultimate outcome 
of random forest classification is the average of the 
voting results from all decision trees. In the 
experiment conducted, two testing schemes are 
employed: one for testing trained individuals and 
another for testing untrained individuals. Post image 
preprocessing, the PCA and 2DPCA methods are 
utilized for extracting expression features, 
subsequently employed in training and classifying 
random forest and Support Vector Machine (SVM) 
classifiers. 
2.2 Deep Learning 
Traditional methods exhibit lower reliance on 
hardware and data types in comparison to deep 
learning approaches. However, they require manual 
application of feature extraction and classification as 
independent steps. In contrast, deep learning methods 
are capable of simultaneously executing these two 
processes. Within the realm of deep learning 
techniques, facial expression recognition entails three 
primary stages: image preprocessing, deep feature 
learning, and deep feature classification. The image 
preprocessing phase is a critical step that typically 
encompasses the utilization of the Viola-Jones 
algorithm for face detection, face alignment, 
normalization, and enhancement to prepare the data. 
In the realms of deep feature learning and 
classification, numerous methodologies such as CNN, 
DBN, DAE, RNN and LSTM have been extensively 
researched and implemented. 
2.2.1 CNN 
In the realm of image preprocessing, CNN 
configurations continue to stand out as the most 
prevalent and cutting-edge, particularly in the realm 
of emotion recognition. Some of the popular CNN 
configurations include region-based CNN (R-CNN), 
faster R-CNN, and 3D CNN. These configurations 
demonstrate varying levels of accuracy across 
different datasets. 
S. Begaj, A. O. Topal proposed the 
implementation details and results analysis of a CNN-
based facial expression recognition model (Begaj, 
2020). This CNN architecture comprises four 
convolutional layers, four max-pooling layers, one 
dropout layer, and two fully connected layers, totaling 
899,718 parameters. The model's processing pipeline 
involves filtering images through Conv2D filters, 
applying ReLU activation functions, downsizing 
images with MaxPooling2D layers, and ultimately 
flattening and applying dropout layers. 
Upon evaluating the model, it was observed that 
the training data outperformed the testing data, 
indicating signs of overfitting. Beyond the 25th epoch,