Real-Time Arabic Sign Language Recognition Using YOLOv5

Zainab Abualhassan, Haidar Ramadhan, Mohammed Faisal Naji and Hajar Alsulaili

Computer Science and Engineering Department, Kuwait College of Science and Technology (KCST), Doha, Kuwait

Keywords:

Sign Recognition, YOLOv5, Machine Learning, Object Detection.

Abstract:

Sign language is a vital means of communication for the deaf and hard-of-hearing community, yet automatic

recognition still faces many challenges. While several sign languages have seen major advances in recogni-

tion systems, Arabic sign language (ArSL) remains underdeveloped and requires much more research. Object

detection models like YOLOv5 (You Only Look Once, Version 5) have revolutionized computer vision with

their high speed, accuracy, and ability to process data in real time. This paper introduces a recognition system

leveraging YOLOv5 , a leading object detection model, to classify the 28 letters of the Arabic alphabet. The

model was trained on a comprehensive dataset containing thousands of images representing each letter, achiev-

ing strong classiﬁcation results with certain classes reaching perfect accuracy of 100%. To assess the model’s

performance, evaluation metrics such as precision, recall, and mean Average Precision (mAP) were employed,

demonstrating its practicality for real-world applications. Results indicate that YOLOv5’s architecture, with

its efﬁcient feature extraction and real-time processing, reliably handles the complex hand gesture variations in

Arabic sign language. Its capability to distinguish subtle differences in hand positions makes it a valuable tool

for educational applications, accessibility solutions for the deaf and hard-of-hearing, and future advancements

in sign language translation systems. This study contributes a robust Arabic sign language recognition model,

addressing an essential need for improved accessibility and communication for Arabic-speaking users.

1 INTRODUCTION

Effective communication is essential for fostering

connection and understanding, yet it poses unique

challenges for the deaf community, particularly in

Arabic-speaking countries where Arabic Sign Lan-

guage (ArSL) plays a vital role. ArSL is not just a

means of communication; it is a cultural and linguis-

tic system that reﬂects the Arabic language and tra-

ditions. Unlike standardized languages like Ameri-

can Sign Language (ASL), ArSL is heavily inﬂuenced

by regional dialects. This inﬂuence leads to signiﬁ-

cant variability, where the same word or phrase can

have different signs depending on the country or even

speciﬁc areas within a country (Al-Shamayleh et al.,

2020). This regional diversity complicates efforts to

create a uniﬁed recognition system, requiring mod-

els to adapt to speciﬁc dialectal differences and ad-

dress the lack of a standardized form of ArSL (Abdel-

Fattah, 2005).

The limited availability of high-quality ArSL

datasets has left research in this ﬁeld relatively sparse

compared to studies on other sign languages. Most

existing datasets consist of static signs, often re-

stricted to the Arabic alphabet, lacking the continuity

needed for sentence-level or contextual gesture recog-

nition (Al-Qurishi et al., 2021). This scarcity of an-

notated data hinders the development of robust ma-

chine learning models capable of generalizing across

diverse gestures and limits their practical application

in real-world settings. Additionally, the absence of

a comprehensive recorded ArSL literature and incon-

sistent formal education for the Deaf in Arab coun-

tries increase these challenges (Abdel-Fattah, 2005).

To address these limitations, numerous efforts

have been made to automate sign language recog-

nition, employing both classical machine learning

techniques, such as Support Vector Machines (SVM)

(Almasre and Al-Nuaim, 2016), and advanced deep

learning methods like Convolutional Neural Net-

works (CNNs) (Suliman et al., 2021). Transfer learn-

ing has further improved detection accuracy by lever-

aging pre-trained models to adapt to the unique fea-

tures of ArSL (Alharthi and Alzahrani, 2023). More-

over, modern object detection models like YOLO

have achieved exceptional performance in real-time

recognition, offering precise detection of hand ges-

tures with high speed and accuracy.

ArSL recognition remains a challenging task due

Abualhassan, Z., Ramadhan, H., Naji, M. F. and Alsulaili, H.

Real-Time Arabic Sign Language Recognition Using YOLOv5.

DOI: 10.5220/0013595200004000

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2025) - Volume 1: KDIR, pages 181-187

181

to the limited availability of large-scale datasets, vari-

ability in hand gestures, and the need for real-time

processing capabilities. The contributions of this pa-

per are as follows:

• Utilizing YOLOv5 for Arabic Sign Language

(ArSL) Recognition: This study explores the

application of the YOLOv5 model for detecting

and classifying 28 Arabic sign language alphabet

gestures, addressing the challenge of recognizing

gestures within a small dataset.

• Real-Time Model Implementation: The pro-

posed approach emphasizes real-time detection

and classiﬁcation capabilities, making it suitable

for practical applications requiring instant recog-

nition.

The paper is organized as follows: Section 2 presents

a literature review of advances in ArSL recogni-

tion, Section 3 details methodology and implemen-

tation, Section 4 discusses results and comparisons

with state-of-the-art methods, and Section 5 con-

cludes with future research directions.

2 LITERATURE REVIEW

Early efforts in ArSL recognition primarily relied

on classical machine learning techniques, emphasiz-

ing image processing and feature extraction for ges-

ture classiﬁcation. (Aly and Mohammed, 2014) de-

veloped an ArSL recognition system in 2014 using

Local Binary Patterns on Three Orthogonal Planes

(LBP-TOP) and SVM, which involved preprocessing

steps such as segmenting the hand and face through

RGB-to-color-space conversion. Similarly, (Tharwat

et al., 2021) proposed a system in 2021 focusing on

28 Quranic dashed letters, employing classiﬁers such

as K-Nearest Neighbor (KNN), Multilayer Perceptron

(MLP), C4.5, and Na

ıve Bayes. Their approach uti-

lized a dataset of 9240 images captured under vary-

ing conditions and achieved a recognition accuracy of

99.5% for 14 letters using KNN. While these methods

demonstrated reasonable accuracy, they were con-

strained by limited scalability and the lack of real-

time implementation capabilities.

Researchers have increasingly adopted advanced

deep learning techniques for sign language recogni-

tion across various languages. For instance, (Tao

et al., 2018) utilized CNNs to address ASL recog-

nition, highlighting CNNs’ ability to effectively cap-

ture sign gestures. Similarly, (Suliman et al., 2021)

proposed a method for ArSL recognition, combin-

ing CNNs for feature extraction and Long Short-Term

Memory (LSTM) networks for classiﬁcation. Their

approach employed the AlexNet architecture to ex-

tract deep features from input images and utilized

LSTMs to maintain the temporal structure of video

frames. The system achieved an overall recognition

accuracy of 95.9% in signer-dependent scenarios and

43.62% in signer-independent scenarios.

Pretrained models are widely used in sign lan-

guage recognition for leveraging knowledge from

large datasets. (Duwairi and Halloush, 2022) em-

ployed VGGNet, achieving 97% accuracy on the

ArSL2018 dataset, demonstrating the efﬁcacy of pre-

trained architectures. (Zakariah et al., 2022) explored

the use of EfﬁcientNetB4 on the ArSL2018 dataset,

achieving a training accuracy of 98% and a testing

accuracy of 95%. Their work incorporated extensive

preprocessing and data augmentation to enhance con-

sistency and balance within the dataset.

In addition, pre-trained YOLO-based approaches

have achieved remarkable results. (Ningsih et al.,

2024) applied YOLOv5-NAS-S to BISINDO sign

language, achieving a mAP of 97.2% and Recall

of 99.6%. (Al Ahmadi et al., 2024) introduced

attention mechanisms within YOLO for ArSL de-

tection, achieving a mAP@0.5 of 0.9909. Simi-

larly, (Alaftekin et al., 2024) utilized an optimized

YOLOv4-CSP algorithm for real-time recognition of

Turkish Sign Language, achieving over 98% preci-

sion and recall, further demonstrating YOLO’s efﬁ-

cacy in high-speed and accurate sign language detec-

tion tasks.

A signiﬁcant limitation in ArSL research remains

the lack of standardized datasets (refer Table 1). Most

studies rely on custom datasets with isolated signs,

such as ArSL2018, which is insufﬁcient for compre-

hensive, continuous sign recognition (Al-Shamayleh

et al., 2020).

3 METHODOLOGY AND

IMPLEMENTATION

This section outlines the workﬂow of training and

evaluating the YOLOv5 model for ArSL recogni-

tion, as illustrated in Figure 1. The dataset, is di-

vided into training, validation, and test sets. The

training and validation sets are utilized to train the

YOLOv5 model over 400 epochs, during which hy-

perparameters are ﬁne-tuned to achieve optimal per-

formance. Following the completion of the training

process, the trained model is evaluated using the test

set based on evaluation metrics such as Accuracy,

Precision, Recall, F1 Score, Mean Average Precision

(mAP), mAP@50, mAP@50-95, Intersection over

Union (IoU), Logarithmic Loss, Confusion Matrix,

KDIR 2025 - 17th International Conference on Knowledge Discovery and Information Retrieval

182

Table 1: Overview of Recent Advances in Sign Language

Recognition Techniques and Methodologies.

Ref. Model(s)

Used

Dataset Evaluation

Metrics

Evaluation

Method

(Aly

and Mo-

hammed,

2014)

LBP-TOP

+ SVM

ArSL

database

(23 words, 3

signers)

Accuracy -

(Tharwat

et al.,

2021)

KNN 9240 images

of Arabic sign

language ges-

tures for 28

letters

Accuracy,

RMSE,

Kappa

Statistic

10-fold

cross-

validation

(Suliman

et al.,

2021)

CNN-

LSTM

150 signs (50

repetitions

each)

Accuracy Train-test

split (70-

30)

(Duwairi

and Hal-

loush,

2022)

VGGNet,

AlexNet,

GoogleNet

ArSL2018

dataset

Accuracy

(97% for

VGGNet)

10-fold

cross-

validation

(Zakariah

et al.,

2022)

EfﬁcientNetB4 ArSL2018

(54,049 im-

ages, 32

classes)

Accuracy

(95%)

Train-test

split (80-

20)

(Ningsih

et al.,

2024)

YOLOv5-

NAS-S

BISINDO (47

classes)

mAP

(97.2%),

Recall

(99.6%)

Not speci-

ﬁed

(Al Ah-

madi

et al.,

2024)

YOLOv5

with At-

tention

ArSL21L

(14,202 im-

ages)

mAP@0.5

(0.9909)

Not speci-

ﬁed

(Alaftekin

et al.,

2024)

YOLOv4-

CSP

Turkish Sign

Language

(numbers)

Precision

(>98%),

Recall

(>98%),

F1 Score

Not speci-

ﬁed

and Area Under Curve (AUC-ROC). Subsequently,

the trained model is deployed within a user interface

framework, enabling real-time prediction capabilities.

Upon providing an input image, the model generates

the corresponding predicted class labels and bound-

ing boxes, effectively demonstrating its proﬁciency in

object recognition and localization.

3.1 Dataset

The Arabic Sign Language Dataset, hosted on Kag-

gle, consists of 5832 images representing 28 Arabic

letters (Arabic Sign Language ArSL dataset, 2022).

These images are divided into 4651 images for train-

ing, 891 for validation, and 290 for testing. Each

image has a resolution of 416 × 416 pixels, provid-

ing sufﬁcient detail for machine learning applications.

The images were captured in various environments

using a cell phone camera, featuring diverse back-

grounds and varying hand angles, which adds natural

variation to the dataset.

As shown in Figure 2, the dataset exhibits an im-

Figure 1: Workﬂow for Training, Evaluation, and Deploy-

ment of YOLOv5 for Arabic Sign Language Recognition.

balance across the 28 classes, with certain classes,

such as ”fa” and ”ain,” containing signiﬁcantly more

samples. This class imbalance poses challenges dur-

ing model training, emphasizing the importance of

preprocessing strategies like data augmentation or

class weighting to ensure fair and effective training.

Despite these challenges, the dataset is a valuable re-

source for advancing sign language recognition mod-

els, promoting accessibility and improved communi-

cation for the deaf and hard-of-hearing community.

Figure 2: Image count per class for the Arabic Sign Lan-

guage Unaugmented Dataset.

3.2 Proposed Model: YOLO

Framework

YOLO, introduced in 2015 (Redmon et al., 2015),

revolutionized object detection by providing a single-

stage system that processes an image in a single for-

ward pass for simultaneous bounding box and class

Real-Time Arabic Sign Language Recognition Using YOLOv5

183

prediction. YOLO processes an entire image in a sin-

gle forward pass of the network, dividing it into a grid

and predicting bounding boxes and class probabilities

simultaneously. Its architecture, as illustrated in Fig-

ure 3, includes convolutional layers for feature extrac-

tion, upsampling for multi-scale detection, and an-

chor boxes to capture objects of different sizes. This

efﬁciency and adaptability make YOLO suitable for

a wide range of applications, from real-time surveil-

lance to medical imaging and autonomous systems.

YOLOv5, introduced in 2020 (Jocher et al., 2020),

is an open-source, PyTorch-based object detection

model known for its real-time performance and scal-

ability. Unlike earlier versions, YOLOv5 incor-

porates innovations like mosaic data augmentation,

auto-learning bounding box anchors, and enhanced

architecture. It offers scalability through variants like

YOLOv5s (small) to YOLOv5x (extra-large), cater-

ing to different resource and accuracy requirements.

The architecture integrates CSP (Cross Stage Partial)

layers for efﬁcient feature extraction, PANet (Path

Aggregation Network) for feature aggregation, and

SPP (Spatial Pyramid Pooling) for expanded recep-

tive ﬁelds. These reﬁnements enable YOLOv5 to de-

liver state-of-the-art performance while maintaining

computational efﬁciency, making it ideal for real-time

applications such as Arabic Sign Language gesture

detection.

Figure 3: YOLO model architecture (Redmon et al., 2015).

3.3 Evaluation Metrics

Evaluation metrics (Manning and Sch

utze,

1999)(Shanmugamani, 2018) are critical in as-

sessing the performance of a machine learning

model, particularly for classiﬁcation tasks such as

sign language recognition. These metrics provide

insights into the model’s ability to make accurate

predictions and generalize across unseen data. Accu-

racy is the most straightforward metric, measuring

the proportion of correct predictions among all

instances, as deﬁned in Equation 1. However, it can

be misleading in imbalanced datasets. Precision,

deﬁned in Equation 2, evaluates the accuracy of

positive predictions, making it important in scenarios

where false positives have signiﬁcant consequences.

Recall, also known as sensitivity and shown in

Equation 3, measures the model’s ability to identify

all relevant instances, which is crucial for minimizing

false negatives. The F1 Score, deﬁned in Equation 4,

provides a balanced measure by combining precision

and recall, especially when these metrics are in

trade-off.

For object detection tasks, Mean Average Preci-

sion (mAP) quantiﬁes the precision-recall relation-

ship across various conﬁdence thresholds, as de-

scribed in Equation 5. It provides a comprehensive

view of model performance across all classes. Fur-

thermore, Intersection over Union (IoU), deﬁned in

Equation 6, assesses the spatial overlap between pre-

dicted and actual bounding boxes, making it vital for

evaluating localization accuracy. Together, these met-

rics offer a robust framework for understanding the

effectiveness of the model in recognizing and classi-

fying Arabic sign language gestures.

Accuracy =

TP + TN

TP + TN + FP + FN

(1)

Precision =

TP + FP

(2)

Recall =

TP + FN

(3)

F1 Score = 2 ·

Precision · Recall

Precision + Recall

(4)

mAP =

∑

i=1

(5)

IoU =

Area of Overlap

Area of Union

(6)

4 RESULTS AND ANALYSIS

This section presents the ﬁndings from implementing

the YOLOv5 model for classifying Arabic alphabets

in sign language.

4.1 Confusion Matrix

The confusion matrix, illustrated in Figure 4, serves

as a performance measurement tool for the Arabic

Sign Language recognition model. Each row repre-

sents the predicted labels, and each column represents

the true labels. Diagonal elements display the number

of correct predictions for each class, with most classes

achieving a perfect score of 1.00, indicating high ac-

curacy. The exception is the letter ”KHAA,” which

shows a minor misclassiﬁcation rate, yielding an ac-

curacy of 0.97.

KDIR 2025 - 17th International Conference on Knowledge Discovery and Information Retrieval

184

Figure 4: Confusion matrix after training the model for 400

epochs.

4.2 Training and Validation

Performance

The training and validation performance metrics for

the Arabic Sign Language recognition model exhibit

considerable improvements over 400 epochs. The

graphs in Figure 5 illustrate decreasing trends in box,

object, and classiﬁcation losses, suggesting effective

learning. Both precision and recall approach 1.0, indi-

cating high accuracy and completeness in predictions.

Mean average precision (mAP) metrics, calculated at

IoU thresholds of 0.5 and 0.5:0.95, indicate excellent

precision across a range of IoU values, further con-

ﬁrming the model’s reliability in recognizing Arabic

sign language gestures.

As shown in Table 2, the evaluation metrics for

the Arabic Sign Language Dataset demonstrate the

model’s robust performance across 28 classes. The

dataset contains 891 images per class, with an av-

erage precision of 0.981, recall of 0.998, mAP@50

of 0.980, and mAP@50-95 of 0.890. While most

classes, such as ”ALIF” and ”BAA,” achieved near-

perfect metrics, certain classes, such as ”QAAF,”

showed lower precision (0.596) and mAP@50-95

(0.540), highlighting areas for improvement. These

results indicate the model’s effectiveness in recogniz-

ing Arabic sign language gestures, though some chal-

lenges remain for speciﬁc classes with lower perfor-

mance.

4.3 Evaluation Curves

The model’s classiﬁcation performance is detailed

through several evaluation curves, as depicted in Fig-

ure 6:

• (a) Recall-Conﬁdence Curve: Recall remains

high across all conﬁdence levels, suggesting

Table 2: Performance Metrics for Arabic Sign Language

Classes.

Class Images Instan-

ces

Preci-

sion

Recall mAP@

mAP@

50-95

all 891 870 0.981 0.998 0.980 0.890

ALIF 891 29 1.000 0.964 0.995 0.802

BAA 891 28 0.997 1.000 0.995 0.882

TA 891 30 0.996 1.000 0.995 0.896

THA 891 30 0.995 1.000 0.995 0.924

JEEM 891 30 0.996 1.000 0.995 0.872

HAA 891 30 0.997 1.000 0.995 0.869

KHAA 891 30 0.965 0.967 0.948 0.812

DELL 891 30 0.996 1.000 0.995 0.897

DHELL 891 32 0.996 1.000 0.995 0.910

RAA 891 32 0.999 1.000 0.995 0.915

ZAY 891 31 0.997 1.000 0.995 0.914

SEEN 891 33 0.995 1.000 0.995 0.935

SHEEN 891 34 0.998 1.000 0.995 0.931

SAD 891 35 0.998 1.000 0.995 0.862

DAD 891 35 0.997 1.000 0.995 0.942

TAA 891 33 0.997 1.000 0.995 0.951

DHAA 891 31 0.997 1.000 0.995 0.954

AYN 891 30 1.000 1.000 0.995 0.900

GHAYN 891 31 0.997 1.000 0.995 0.936

FAA 891 31 0.996 1.000 0.995 0.916

QAAF 891 31 0.596 1.000 0.613 0.540

KAAF 891 31 0.996 1.000 0.995 0.917

LAAM 891 31 0.995 1.000 0.995 0.936

MEEM 891 31 0.995 1.000 0.995 0.922

NOON 891 30 0.998 1.000 0.995 0.903

HA 891 30 0.996 1.000 0.995 0.874

WAW 891 31 0.996 1.000 0.995 0.918

YA 891 30 0.997 1.000 0.995 0.900

Avg. 891 60 0.981 0.998 0.980 0.890

Figure 5: Training and validation metrics over 400 epochs.

that the model consistently identiﬁes relevant in-

stances.

• (b) F1-Conﬁdence Curve: The high F1 score in-

dicates a balanced performance between precision

and recall across various conﬁdence thresholds.

• (c) Precision-Conﬁdence Curve: Precision is

maintained at high levels for most conﬁdence val-

ues, indicating that the model’s predictions are

highly accurate.

• (d) Precision-Recall Curve: A strong relation-

ship between precision and recall is observed,

with an mAP of 0.980 at IoU=0.5, demonstrating

the model’s effectiveness in accurately detecting

Real-Time Arabic Sign Language Recognition Using YOLOv5

185

Figure 6: Evaluation curves for the Arabic Sign Language

recognition model, showing (a) Recall-Conﬁdence, (b) F1-

Conﬁdence, (c) Precision-Conﬁdence, and (d) Precision-

Recall.

and classifying Arabic sign language gestures.

4.4 Model Interface

The Arabic Sign Language recognition model, trained

using Python programming, is designed to detect and

classify gestures from both hands simultaneously, as

illustrated in Figure 7. The interface of the model

emphasizes the need for adequate lighting and high-

quality camera resolution to ensure precise detection

and classiﬁcation of hand gestures. These factors

are crucial for capturing clear and detailed images,

which signiﬁcantly enhance the model’s accuracy, as

reﬂected in the confusion matrix and other evaluation

metrics.

Figure 7: Model Interface.

5 CONCLUSION

This study developed an Arabic Sign Language

recognition model using the YOLOv5 architecture,

made for real-time classiﬁcation of Arabic alphabets

through hand gestures. The model achieved high ac-

curacy, by achieving nearly 100% on precision, recall,

and mAP metrics, particularly at an IoU threshold of

0.5. The evaluation curves, confusion matrix, and

training metrics further support the model’s robust-

ness and reliability in recognizing Arabic sign lan-

guage.

The developed system holds potential for applica-

tions in sign language translation, educational tools,

and accessibility technologies for the deaf and hard-

of-hearing community. Future improvements may

involve augmenting the dataset with more diverse

hand shapes and backgrounds to further enhance the

model’s generalizability. Additionally, exploring ad-

vanced versions of YOLO or other deep learning ar-

chitectures could further optimize performance for

real-world applications. This work marks a signif-

icant step in developing accessible tools for Ara-

bic sign language communication, enhancing under-

standing and fostering better connections within the

community.

REFERENCES

Abdel-Fattah, M. A. (2005). Arabic sign language: A per-

spective. The Journal of Deaf Studies and Deaf Edu-

cation, 10(2):212–221.

Al Ahmadi, S., Mohammad, F., and Al Dawsari, H. (2024).

Efﬁcient yolo based deep learning model for arabic

sign language recognition. Unpublished Manuscript

or Add Speciﬁc Journal.

Al-Qurishi, M., Khalid, T., and Souissi, R. (2021). Deep

learning for sign language recognition: Current tech-

niques, benchmarks, and open issues. IEEE Access,

9:126917–126951.

Al-Shamayleh, A. S., Ahmad, R., Jomhari, N., and

Abushariah, M. A. M. (2020). Automatic arabic

sign language recognition: A review, taxonomy, open

challenges, research roadmap and future directions.

Malaysian Journal of Computer Science, 33(4):306–

343.

Alaftekin, M., Pacal, I., and Cicek, K. (2024). Real-time

sign language recognition based on yolo algorithm.

Neural Computing and Applications, 36:7609–7624.

Alharthi, N. M. and Alzahrani, S. M. (2023). Vision

transformers and transfer learning approaches for ara-

bic sign language recognition. Applied Sciences,

13(21):11625.

Almasre, M. A. and Al-Nuaim, H. (2016). Recognizing

arabic sign language gestures using depth sensors and

KDIR 2025 - 17th International Conference on Knowledge Discovery and Information Retrieval

186

a ksvm classiﬁer. In 2016 8th Computer Science and

Electronic Engineering (CEEC), pages 146–151.

Aly, S. and Mohammed, S. (2014). Arabic sign language

recognition using spatio-temporal local binary pat-

terns and support vector machine. In Hassanien, A. E.,

Tolba, M. F., and Azar, A. T., editors, Advanced Ma-

chine Learning Technologies and Applications, vol-

ume 488 of Communications in Computer and Infor-

mation Science, pages 95–102. Springer, Cham.

Arabic Sign Language ArSL dataset (2022). Arabic sign

language arsl dataset. Kaggle.

Duwairi, R. M. and Halloush, Z. A. (2022). Automatic

recognition of arabic alphabets sign language using

deep learning. International Journal of Electrical &

Computer Engineering, 12(3).

Jocher, G. et al. (2020). Yolov5 by ultralytics. https://github.

com/ultralytics/yolov5.

Manning, C. and Sch

utze, H. (1999). Foundations of Statis-

tical Natural Language Processing. MIT Press.

Ningsih, M. R., Nurriski, Y. J., Sanjani, F. A. Z., Al Hakim,

M. F., Unjung, J., and Muslim, M. A. (2024). Sign

language detection system using yolov5 algorithm to

promote communication equality people with disabili-

ties. Scientiﬁc Journal of Informatics, 11(2):549–558.

Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.

(2015). You only look once: Uniﬁed, real-time ob-

ject detection.

Shanmugamani, R. (2018). Deep Learning for Computer

Vision. Packt Publishing.

Suliman, W., Deriche, M., Luqman, H., and Mohandes, M.

(2021). Arabic sign language recognition using deep

machine learning. In 2021 4th International Sym-

posium on Advanced Electrical and Communication

Technologies (ISAECT), pages 1–4.

Tao, W., Leu, M. C., and Yin, Z. (2018). American sign lan-

guage alphabet recognition using convolutional neural

networks with multiview augmentation and inference

fusion. Engineering Applications of Artiﬁcial Intelli-

gence, 76:202–213.

Tharwat, G., Ahmed, A. M., and Bouallegue, B. (2021).

Arabic sign language recognition system for alphabets

using machine learning techniques. Journal of Elec-

trical and Computer Engineering, 2021(1):2995851.

Zakariah, M., Alotaibi, Y. A., Koundal, D., Guo, Y.,

and Elahi, M. M. (2022). Sign language recogni-

tion for arabic alphabets using transfer learning tech-

nique. Computational Intelligence and Neuroscience,

2022(1):4567989.

Real-Time Arabic Sign Language Recognition Using YOLOv5

187