input images, e.g., digital displays or pictures of
written text, and passes the retrieved text to the same
conversion mechanism used for text-to-signs. This
approach of combining input modalities into an
integrated yet modular design serves to optimize
system usability and accessibility, while ensuring
output consistency in the sign language output
irrespective of the input registered type. By utilizing
a single architecture for multiple insert formats, the
system adapts to a variety of real-world
communications scenarios by speeding the
conversion process while maintaining correctness
and reliability across several insert formats.
2 RELATED WORKS
Specifically, (Muhammad al-Qurishi, et.al, 2021)
introduced a model that surveyed progress in sign
language translation and recognition to cover
comprehensive communication solutions. This
review highlighted the deep learning methods for sign
language identification while addressing issues like
real-time processing, and contextual variability. The
findings had an immediate influence on design
considerations for sign recognition systems.
Transformers-based designs for handling
sequential nature of sign language data were
researched (Necati Cihan Camgoz, et.al) and proved
their effectiveness, but the majority of the work was
still centered on Western sign languages.
Real-time sign detection has been studied using
advanced object detection frameworks (Shobhit
Tyagi, et al, 2023). The study also focused on gesture-
through-parts: the study of recognition of 55 different
signs from Alphabets and Integers.
A detailed review on a variety of machine learning
methods (A. Adeyanju, et al, 2021) assisted in
narrowing down on deep learning techniques that led
to robust recognition accuracy across diverse settings
like varying ambient light and camera placements.
In the paper (Yogeshwar I. Rokade, Prashant M.
Jadav, 2017) the problem of the recognition of Indian
Sign Language (ISL) is discussed. The paper
underscored the key takeaway of creating extensive
datasets and ISL focused solutions — highlighting the
necessity of individual-centric approaches for ISL
recognition and translation.
Study investigated the differences in structure
between ISL and spoken languages. In comparison,
ISL does not have the equivalent of an auxiliary verb
like “is” or “are” as in English. The English sentence,
“The school opens in April,” translates to ISL as
“SCHOOL OPEN APR.” ISL also employs
fingerspelling, in which gestures mimic letters of the
alphabet to spell names and technical terms. These
structural differences require specialized models to be
developed for ISL recognition.
Article (Sinha, et al, 2020) explored the broader
implications of sign language recognition systems for
accessibility, particularly how they can bridge
communication gaps for the hearing-impaired
community. The research highlighted the importance
of user-centered customizable solutions to enhance
inclusion in various contexts. Sign language
translation efficiency was also suggested to be
improved by combining deep learning and image
processing methods according to the research. Note:
Another similar work done by Sinha, Swapnil &
Kataruka, Harsh & Kuppusamy, Vijayakumar (2020)
entitled Image to Text Converter and Translator:
Realization of Image to Text Converter and
Translator using Deep Learning and Image
Processing in the International Journal of Innovative
Technology and Exploring Engineering. They
explored how image-based text could be converted,
forming an essential backdrop for implementation of
sign language as well.
The work presented in paper focused on the
impact of various machine learning models on
improving the efficiency of sign language translation,
specifically investigating strategies to optimize the
translation model to enhance recognition accuracy
while keeping the real-time user experience. The
results underlined the importance of deep learning in
speech to text applications and corresponds to
artworks that subject a speech-driven sign language
translation model. They correlated their findings with
"Speech to Text using Deep Learning" (IJNRD,
2024): Speech is one of the methods of
communication in which the person speaks, and the
voice is converted into the text. One of the key
highlights of this study was the acknowledgement of
real-time processing and accuracy optimization in
deep learning-enabled speech-to-text systems, which
provides a significant step towards building an
efficient sign language translation framework.
Perceived Usefulness (PU), Recognition
Accuracy (RD), Perceived Ease of Use (PES), and
Compatibility (CY) form perceived features of Indian
Sign Language (ISL) translation systems. On the
basis of these features the research suggests the
following hypotheses:
The perceived features of Indian Sign Language
(ISL) translation systems include Perceived
Usefulness (PU), Recognition Accuracy (RD),
Perceived Ease of Use (PES), and Compatibility