(ISL) dataset and 100% for the ASL dataset,
surpassing VGG-11 and VGG-16. The system's
robustness is confirmed through supplemented data,
demonstrating invariance to rotational and scaling
changes. The methodology predominantly
emphasizes static motions, neglecting dynamic
gestures and continuous sign language recognition
(Sharma, Singh, et al. 2021).
Romala Sri Lakshmi Murali et al. (2022)
presented HSV color detection and computer vision
methodologies to segment hand motions for the
recognition of 10 ASL alphabets. The system
acquires hand gesture images through a camera,
processes them through grayscale conversion,
dilation, and masking procedures, and extracts binary
pixel features for classification purposes. A CNN is
employed for training, attaining an accuracy
exceeding 90%. Results encompass proficient gesture
recognition with negligible ambiguity. The device
identifies only 10 static ASL alphabets, missing the
capability for dynamic gestures or comprehensive
alphabet recognition (Murali, Ramayya, et al. 2020).
Muneer Al-Hammadi et al. (2020) introduced
various deep learning architectures to tackle dynamic
hand gesture detection through the management of
hand segmentation, local shape representation, global
body configuration, and gesture sequence modeling.
The evaluation is conducted on a demanding dataset
of 40 dynamic hand gestures executed by 40
individuals in uncontrolled environments. The model
surpasses leading methodologies, demonstrating
enhanced recognition accuracy. Results encompass
proficient gesture recognition in unregulated settings.
The model's efficacy may diminish in highly cluttered
or dimly lit settings, where hand segmentation is
more difficult (Al-Hammadi, Muhammad, et al.
2020).
Ghulam Muhammad et al. (2020) created a deep
CNN utilizing transfer learning for hand gesture
identification, tackling the issue of spatiotemporal
feature extraction in sign language research. It was
evaluated on three datasets comprising 40, 23, and 10
gesture categories. The system attained recognition
rates of 98.12%, 100%, and 76.67% in signer-
dependent mode, and 84.38%, 34.9%, and 70% in
signer-independent mode. Results demonstrate
significant precision in signer-dependent scenarios. A
constraint is the diminished efficacy in signer-
independent mode, particularly for datasets with a
limited number of gesture types (Al-Hammadi,
Muhammad, et al. 2020).
Abul Abbas Barbhuiya et al. (2020) introduced a
deep learning-based CNN for resilient hand gesture
recognition (HGR) of alphabets and numerals in
ASL, utilizing modified AlexNet and VGG16 for
feature extraction and a support vector machine
(SVM) classifier for final classification. It employs
both leave-one-subject-out and 70–30 cross-
validation methodologies. The system attains a
recognition accuracy of 99.82%, above contemporary
approaches. Results encompass elevated precision,
economic efficiency, and character-level
identification. The system exclusively accommodates
static motions, hence constraining its capability to
recognize dynamic sequences or continuous gestures
(Barbhuiya, Karsh, et al. 2021).
Razieh Rastgoo et al. (2020) proposed a deep
learning pipeline that integrates SSD, 2DCNN,
3DCNN, and LSTM for the automatic recognition of
hand sign language from RGB videos. It estimates
three-dimensional hand keypoints, constructs a hand
skeleton, and extracts spatiotemporal characteristics
utilizing multi-view hand skeletons and heatmaps.
The aggregated features are analyzed using 3DCNNs
and LSTM to capture long-term gesture dynamics.
Assessment of the NYU, First-Person, and RKS-
PERSIANSIGN datasets indicates that the model
surpasses leading methodologies. The computational
complexity of the multi-modal technique may impede
real-time applications in resource-constrained
contexts (Rastgoo, Kiani, et al. 2020).
Eman K. Elsayed et al. (2020) developed a
semantic translation system for dynamic sign
language recognition employing deep learning and
Multi Sign Language Ontology (MSLO). It utilizes
3D Convolutional Neural Networks (CNN)
succeeded by Convolutional LSTM to enhance
recognition accuracy and enables user customization
of the system. Evaluated on three dynamic gesture
datasets, it attained an average recognition accuracy
of 97.4%. Utilizing Google Colab for training
decreased runtime by 87.9%. Results encompass
improved recognition via semantic translation and
customisation features. The dependence on Google
Colab for performance enhancement, which may not
be available in all settings (Elsayed, and Fathy,
2021).
Ahmed Kasapbasi et al. (2022) created a CNN-
based sign language interface to translate ASL
motions into normal English, utilizing a newly
established dataset with diverse lighting and distance
conditions. The model attained an accuracy of
99.38% and a minimal loss of 0.0250 on the new
dataset, surpassing performance on prior datasets
with consistent conditions. The results demonstrate
great accuracy across many datasets, indicating
robustness under multiple settings. A shortcoming is
the emphasis on the alphabet instead of complete