new backbone and neck architecture of this model
enables better aggregation of features, thus improving
detection accuracy for small objects. The simplified
architectural structure of YOLOv8 makes the system
more effective in handling system resource
limitations. Future models will enhance hand gesture
detection systems with YOLOv8 technology
developments, processing data more rapidly and
precisely, and to generate more trustworthy real-time
communication that supports inclusiveness in daily
activities.
2
RELATED WORKS
Escalera, et al., 2016 This deep learning-based system
helps hearing impaired people in emergency
situations. The authors demonstrate awareness of
communication obstacles experienced by this
community during critical situations requiring
essential speech-based warnings. A system has been
created to translate Indian Sign Language (ISL) into
spoken language or text for real-time emergency
responder communications.
C.-Y. Wang, Et al 2022, The proposed method
demonstrates a promising ability to fill
communication gaps for emergency situations
involving hearing-impaired people, yet involves
several constraints. The system performance suffers
because of unpredictable lighting conditions and
surrounding background objects as well as changes in
signing speed and manner. The system restricts its
detection capability to both regional sign language
variations and the need to identify new signs unless
trained. Any emergency deployment of this system
demands rigorous evaluation of system delays
together with reliability protocols which will protect
user privacy boundaries. The successful execution of
this project depends on building a system with
maximum security protocols along with universal
accessibility. Emergency communication safety for
hearing-impaired individuals shows significant
improvement through this work where the authors
overcome key challenges. Prolonged research
activities and developmental work must continue to
improve the system implementation for practical
deployment.
Aditi Deshpande, et al., 2023 The research
advancement and challenges into multimodal gesture
recognition try to decode human body movements
through audio and video, yet also use depth
information to understand gestures effectively.
Single-modality systems have proven insufficient, so
the paper argues for multimodal approaches.
Although there are still significant challenges to
overcome. Complex methods for effective data fusion
generate crucial elements because they need
approaches to deal with different types of data and
resolution levels.
There must be exact temporal alignment between
different modalities, although achieving this
synchronization remains challenging due to
mismatch variations. The analysis of multimodal data
requires advanced algorithms together with
appropriate hardware resources because this process
demands significant computational strength. Strong
resistance against noise, variations in lighting and
changes in viewpoint stands as a critical requirement.
The analysis of how individual modalities contribute
to final recognition results is needed to enhance
system quality and find bugs.
The paper emphasizes the need for advanced
evaluation techniques combined with wide-ranging
datasets to develop the study field. The paper
demonstrates how deep learning methods can combat
some recognition issues by performing automatic
representation learning and feature extraction.
Multimodal gesture recognition technology
encounters various challenges, although it brings
major opportunities to human-computer interaction
along with applications in robotic systems and
healthcare management.
K Amrutha, et al., 2021 A Sign Language
recognition system operates with a Convolutional
Neural Network (CNN) architecture for its
functioning. Real-time interaction between hearing-
impaired persons and everyone else is the goal of this
system, which works to close communication
breakdowns. The system chooses a monocular
camera to record video footage that enables sign
detection and classification of ten different ASL signs
through processing. Through its CNN architecture,
this system effectively finds essential features present
in video frames that include hand positions and both
positions and movements. The training and
validation process of the model operates on ASL sign
datasets, which delivers an accuracy rate of 98.53%
during training and 98.84% during validation. The
research focuses on the use of various datasets and
evaluation metrics found in the field and emphasizes
that robustness depends on training systems with
diverse large datasets. The authors show how deep
learning advancements, especially convolutional
neural networks (CNNs) boost hand gesture
recognition accuracy through their automatic
processing of raw image data features.
The evaluation emphasizes crucial barriers which
researchers face during present studies. The main