Enhancing Construction Site Safety: Personal Protective Equipment
Detection Using Yolov11 and OpenCV
Ruchitraa Rajagopal, Harshitha Pulluru, Aaradhya Joshi and Shanmuganathan C.
Department of Computer Science and Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, Tamil
Nadu, India
Keywords: Construction Site Safety (CSS), Personal Protective Equipment (PPE), You Only Look once (YOLO),
Real‑Time Object Detection (RTOD), Site‑Risk Assessment (SRA), Worker Safety Compliance (WSC),
Machine Learning (ML), Computer Vision (CV).
Abstract: One of the most emerging industries in the world is construction. Even after installing a lot of safety rules and
regulations, construction site accidents are a major threat. This is because of non-compliance with the assigned
safety protocols and the lack of a system assigned for supervision. The paper provides a model that can
automate the safety system using Computer Vision (CV) and Machine Learning (ML) techniques. The model
analyzes the surveillance footage to detect workers and their personal protective equipment (PPE), such as
safety vests, masks, and hard hats. It also calculates the distance between workers and machinery and sends
alerts in case of unsafe distances. The system also provides a graphical user interface where the site managers
or supervisors can easily monitor the surveillance footage with the model results in real time. The warnings
are stored in a database along with a video clip and other essential information. This way, the supervisors can
view the database and get an idea about the type of warnings and when and where they occurred. This way,
they can guide the workers regarding site safety. The model used by the system is YOLOv11. It is trained and
tested to give better results in various weather conditions. The model achieves a mAP of 81% at 0.5 IoU and
a mAP of 60.3% at 0.5 to 0.95 IoU, with an overall accuracy, precision, and recall of 97%, 87%, and 76%,
respectively. It is computationally efficient, with 14.7GB FLOPs, 6.4MB parameters, and an inference speed
of 2.4ms, making the model applicable for real-time analysis.
1 INTRODUCTION
Construction sites are naturally as well as inherently
dangerous environments where the worker’s safety is a
constant challenge. The presence of heavy machinery,
hazardous materials, and complex workflows increases
the likelihood of accidents, thus making compliance
with safety regulations critical. While PPE such as
safety vests, masks, and hard hats play a key role in
injury prevention, monitoring them manually takes up
much time, and is often ineffective in large-scale sites
Chen et al., (2021).
Traditional safety methods, which were quite
laborious, such as manual checking and surveillance
cameras, are often slow and difficult to scale, making
it difficult to detect and prevent accidents.
Advancements in the field of CV and ML have opened
a lot of new techniques to automate safety monitoring.
AI-powered detection models, such as YOLO (You
Only Look Once), have shown promising results in
identifying safety gear and worker behavior from video
feeds. However, challenges such as detecting small
objects, handling occlusions, and maintaining accuracy
in varying lighting and environmental conditions
remain. This research introduces a safety monitoring
system that leverages AI and integrates YOLOv11 to
identify PPE and monitor worker proximity to heavy
machinery in real-world systems Chen et al., (2021).
Included in the system is a graphical user interface
(GUI), where the site managers can monitor the
workers and their compliance with safety protocols.
The proposed system improves detection accuracy by
incorporating multi-scale feature extraction, attention
mechanisms, and bounding box regression Feng et al.,
(2024). Mosaic data augmentation is applied to the
training part of the set, while the testing part is
modified to simulate various weather conditions, such
as high brightness, dust, and motion blur. To further
improve site safety, the system includes a proximity
344
Rajagopal, R., Pulluru, H., Joshi, A. and C., S.
Enhancing Construction Site Safety: Personal Protective Equipment Detection Using Yolov11 and OpenCV.
DOI: 10.5220/0013882800004919
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies (ICRDICCT‘25 2025) - Volume 2, pages
344-353
ISBN: 978-989-758-777-1
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
detection algorithm that helps prevent worker-
machinery collisions.
This paper discusses the methodology,
implementation, performance evaluation, and also
potential future improvements of the proposed system.
The findings highlight how AI-driven monitoring can
transform workplace safety, minimize accidents, and
enhance regulatory compliance in high-risk industries.
2 RELATED WORKS
One of the most emerging industries in the world
construction. Although these industries contribute a
lot to the nation's GDP, even after installing a lot of
safety rules and regulations, construction site
accidents are a major threat. Therefore, a lot of
methods are being implemented to reduce the risk of
accidents. Some of these methods include the Safety
Detection method based on improved YOLOv8,
integrating an AI model for construction site safety,
and using Fast R-CNN and CNN to detect bounding
box coordinates. A summary of these findings is
presented in Table 1.
Table 1: Summary of Findings.
S.No. Paper’s Name Targeted Object Findings
1
“Safety Detection Method based on
improved YOLOv8”
Safety Helmet
Proposed study includes an
improved algorithm
YOLOv8n-SLIM-CA. It uses
mosaic data augmentation and
coordinate attention
mechanism.
2
“Deep Learning Based Workers
Safety Helmet Wearing Detection
on Construction Sites Using Multi-
Scale Features”
Safety Helmet
This study focuses on the
addition of multi-scale features
and attention mechanism along
with the baseline YOLOv5
model.
3
“A Novel Implementation of an AI-
Based Smart Construction Safety
Inspection Protocol in the UAE”
Safety Harness
This study focuses on
developing a CNN-based
model to detect safety harness.
The deep learning network
uses YOLOv3.
One of the articles, A Novel Implementation of
an AI-Based Smart Construction Safety Inspection
Protocol in the UAE”, focuses on integrating AI,
precisely a deep learning approach which is used to
supervise and detect safety violations Shanti et al.,
(2021). This paper focuses on the development of a
CNN-based technique for safety that provides
supervising workers working at construction sites
using a real-time detection and monitoring algorithm,
YOLOv3. It trains a CNN that is used for detecting
equipment such as safety vests and safety helmets.
The main challenge that this method faces is the
difficulty of obtaining surveillance video and training
all the data sets required by the CNN models.
Another research paper that proposes using CV
and ML for the detection of safety helmets is "Safety
Helmet Detection Based on Improved YOLOv8" Lin,
B, (2024). Safety helmets are essential for protecting
workers from head injuries on construction sites, but
relying on manual supervision to ensure compliance
can be inefficient and prone to mistakes. Deep
learning models like YOLO have made helmet
detection possible in real-world implementation, but
they often struggle with spotting small or partially
hidden helmets in busy environments. To overcome
it, the paper uses YOLOv8n-SLIM-CA. This
improved detection model enhances accuracy using
Mosaic data augmentation, a Slim-Neck structure,
and Coordinate Attention. These upgrades help the
model focus better on safety helmets, reduce
complexity, and improve detection in challenging
conditions.
Compared to the standard YOLOv8n, the model
boosts accuracy by 2.151% (mAP@0.5), reduces
model size by 6.98%, and lowers computational load
by 9.76%, making it faster and more efficient Lin, B,
(2024). Tested on the Safety Helmet Wearing Dataset
(SHWD), it outperforms other detection models by
identifying helmets more accurately, even in
crowded, distant, or cluttered backgrounds. This
makes YOLOv8n-SLIM-CA a powerful tool for real-
world safety monitoring Lin, B, (2024). Looking
Enhancing Construction Site Safety: Personal Protective Equipment Detection Using Yolov11 and OpenCV
345
ahead, integrating this model into edge devices could
make real-time helmet detection even more practical
and accessible for industrial safety.
Another research paper that focuses on a CNN
based detection of safety helmets is "Deep Learning
Based Workers Safety Helmet Wearing Detection on
Construction Sites Using Multi-Scale Features"
Han
et al., (2022). Ensuring people wear safety helmets
on-site is crucial for preventing injuries from falling
objects, but traditional monitoring methods can take
much time and are inclined to a lot of mistakes. This
study presents an improved deep learning approach
using YOLOv5, enhanced with a fourth detection
scale to identify small objects better and an attention
mechanism to improve feature extraction.
To address the challenge of limited training data,
targeted data is augmented, followed by the usage of
transfer learning resulting in a 92.2% mean average
precision (mAP) and an improvement of 6.4% in
accuracy, with object detection of just 3.0 ms speed
at 640×640 resolution Han et al., (2022). This makes
the model precise and applicable for real-time
analysis. Thus, automating safety compliance checks
minimizes the need for constant manual supervision,
allowing site managers to identify and respond to
potential risks more efficiently.
3 METHODOLOGY
3.1 Proposed System Architecture
Figure 1: Architecture Diagram.
The model analyzes live video feeds. This ensures
real-time monitoring of worker compliance. The
model training is done on diverse construction site
datasets. So, PPE can be detected in varying lighting
and diverse environmental conditions Sridhar et al.,
(2024). YOLO gives a bounding box as an output for
the detected object, which streamlines safety
enforcement. YOLOv11 significantly improves
workplace safety as well as adherence to various
safety rules. Figure 1 shows the proposed system’s
architecture diagram.
3.2 YOLOv11 Architecture
YOLOv11 is an object detection model applied in a
single shot. It is built for real-time applications, which
offers both speed and accuracy. Traditional models
process images in multiple stages. While YOLOv11
analyzes the entire image in just a single pass, which
makes it highly efficient. Its working involves the
division of the image into a grid and then the
prediction of their object locations and their
classifications simultaneously. It comes with
improved feature extraction and attention
mechanisms. Thus, it excels in identifying small or
overlapping objects with greater precision. To
enhance its accuracy, it utilizes optimized loss
functions like Complete IoU (CIoU) Mahmud et al.,
(2023). This helps fine-tune object localization as
well as minimize incorrect detections. YOLOv11
comes with a lightweight design, which ensures quick
processing. Thus, YOLOv11 is a perfect choice for
tasks that demand real-time object recognition
without compromising performance. Figure 2 shows
the architecture of the YOLOv11 model.
Figure 2: YOLOv11 Architecture Diagram.
ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,
COMMUNICATION, AND COMPUTING TECHNOLOGIES
346
3.3 Single Shot Detector
SSD uses the technique of Non-Max Suppression,
which is used to eliminate any duplicate detections
that would occur. So, by using non-max suppression,
only the most relevant bounding boxes will be
retained. In dynamic construction sites, speed and
accuracy are of utmost importance when we need to
employ it for real-time detection of PPE. The
architecture of the SSD is also lightweight. This is
essential as it can ensure easy deployment on edge
devices Jankovic et al., (2024). This also enables the
system to work without relying on high-end
hardware. Such deployment in construction sites
allows the workers and supervisors to be provided
with instant alerts (2024). Thus, SSD plays a huge
role in enhancing worker safety. So, the integration of
SSD and OpenCV with deep learning techniques
enhances the model and allows it to perform real-time
safety monitoring with minimum latency Shetty et al.,
(2024). The reliability of the model has also increased
because of its ability to work in different lighting
conditions and various environmental variations.
Through proper training, SSD can improve
compliance enforcement and significantly reduce
workplace hazards.
3.4 Bounding Box Regression for PPE
Detection and Machinery
Proximity
Bounding box regression is a fundamental technique
used in object detection models like YOLO and SSD
to localize objects within an image precisely. In the
context of construction site safety, bounding box
regression is used to detect PPE compliance by
identifying helmets, vests, gloves, and other safety
gear. Merely enclosing the detected objects in
bounding boxes is not sufficient. Thus, Bounding Box
Regression calculates x, y, width and height.
Bounding Box Regression is important in
construction site monitoring. By calculation of
Euclidean distance between the workers and
machinery, proximity alerts are designed to trigger
when the computed distance between the two
bounding boxes falls below a certain threshold. Such
trigger alerts help minimize the chances of accidents
by sending proactive notifications to the workers as
well as the supervisors Al-Azani et al., (2024). The
method also benefits from non-maximum
suppression (NMS). Bounding box regression
accuracy depends on proper dataset labelling and
model training with diverse construction site images.
The attention mechanism is one of the advancements
of deep learning, and it further refines the accuracy of
bounding boxes. This ensures precise detection even
in complex environments. Thus, the BBR technique
can be efficiently used to optimize the real-time
monitoring of safety compliances Gautam et al.,
(2024). When combined with an alert system,
workplace safety is enhanced as the hat workers are
instantly made aware of potential hazards.
3.5 Attention Mechanism-Based CNN
Attention mechanisms in deep learning have
revolutionized object detection by improving feature
extraction and focusing on the most relevant image
regions. In PPE detection for construction site safety,
attention mechanisms help CNN models prioritize
critical areas, such as worker faces, helmets, and
vests, while ignoring irrelevant background details.
So, the approach of ignoring irrelevant parts and
focusing on significant features improves detection
accuracy in cluttered environments as well as other
complex environments. This is done by an attention
mechanism based on CNN, which assigns higher
weights to the important regions of the image Guan et
al., (2024). This ensures that the model captures the
local as well as global dependencies efficiently
without any compromise. This means that the model
is enhanced to detect PPE even in challenging
conditions like poor lighting, occlusions or even low-
lighting video feeds. Thus, the integration of
Attention Mechanism based CNN with YOLO and
OpenCV for detection of PPE in real-time achieves
greater precision. Construction sites have different
environments, which the model can quickly adapt to
Ponika et al., (2023). This is a major advantage to
ensure safety compliance across such diverse
environments. Another advantage is the visual
highlight of the area influencing the prediction. Such
transparency helps the safety officers to detect errors
more easily.
The usage of attention-based mechanisms helps in
processing large-scale datasets efficiently. Real-time
applications require such machinations to handle
large data. Thus, the model training is proceeded with
a diverse dataset consisting of various combinations
of colors, textures, and placements using this
methodology Han et al., (2024). This methodology
ensures a reliable and efficient safety monitoring
system that proactively mitigates construction site
hazards.
Enhancing Construction Site Safety: Personal Protective Equipment Detection Using Yolov11 and OpenCV
347
3.6 Image Pre-Processing
Image preprocessing also is used to ensure that the
model focuses on relevant objects. This is done before
enhancing visual clarity. Normalization is applied to
scale pixel values between 0 and 1. This is to prevent
intensity variations from affecting detection
performance. Sobel and Canny filters are edge
detection techniques that are used to sharpen image
boundaries. This helps in the easy detection of PPE
equipment like vests and helmets by the model. The
Grayscale Conversion also simplifies the image
complexity by reducing the amount of unnecessary
data which is processed. This method is done while
retaining key object structures. This method is also
quite useful in cases of background distractions.
Segmentation is a technique that can be used to
separate or isolate the workers and their PPE from the
cluttered environments. Thus, when segmentation is
used, it leads to more precise detections Krishna et al.,
(2021). Thus, before inputting the input image to
modal for training, preprocessing them leads to the
deep learning algorithms to work more efficiently. As
a result, a robust and adaptable PPE detection system
can be achieved.
Image preprocessing is used to enhance visual
clarity. Not only this but it is also used for the
extraction of features by focusing only on relevant
objects. Normalization is such a technique. This is
used to scale the pixel values between the range of 0
and 1 inclusive. Thus, even with varying intensities,
the detection performance is not affected. There are
many edge detection techniques like Sobel and Canny
filters. They are used to sharpen the boundary of
images. This sharpening ensures easier detection of
PPE equipment like safety helmets and vests. In
grayscale conversion, the image is retained only of its
key structures. This simplification reduces the
unnecessary processing of data. When there are
background distractions, segmentation techniques
can be used Gautamet al., (2024). This is a very
powerful deep-learning technique which isolates the
key image parts from unnecessary key structures.
Thus, refining images in preprocessing is vital to
ensure that model training works efficiently.
3.7 Image Augmentation
This is a technique when we need to artificially
expand our datasets. This helps models better
generalize to real-world scenarios. PPE detection
should be able to work across various lighting
conditions, angles and environments for real-time
deployment. Thus, augmentation helps the model to
be trained with a diverse set of images. Flipping or
rotating the images provides the model with various
data points. Scaling is also used to adjust the size of
the images. Thus, the model trained using various
augmentation techniques can recognize PPE from
multiple orientations. Thus, the model can reliably
identify PPE of different sizes. Random cropping of
the images is also added to the dataset Han et al.,
(2022). This prepares the model to identify PPE even
if it is partially visible in cases where vests or helmets
may be partially obscured. Thus, the application of
augmentation techniques prevents overfitting. It also
allows the model to perform reliably even in
unpredictable conditions.
Colour-based augmentations are also applied to
enhance the detection further. This includes
brightness, contrast and hue adjustments.
Construction sites often have varying lighting
conditions. The adjustment of brightness levels in
images ensures that the model is exposed to diverse
image lighting. This ensures effectiveness in both
well-lit and dim conditions. A contrast enhancement
application is used so that clear differentiation of PPE
from the background is achieved. Noise is added to
the training dataset to simulate real-world scenarios
where camera imperfections may occur. This makes
the model resilient to real-world scenarios Shanti et
al., (2021). Finally, techniques like image overlays
and motion blur are applied to the dataset. At the same
time, the data recorded by the camera would have
images or video live feeds of workers who are
partially covered. They might also be moving. Such
real-world scenarios need to be considered while
training the model Azatbekuly et al., (2024). Using
augmentation, the model becomes highly adaptable to
such scenarios. As a result, a detection system across
challenging and diverse environments is achieved.
4 DATA AND EXPERIMENTAL
SETUP
4.1 Computer Configurations
The model was trained on a laptop equipped with a
NVIDIA GeForce GTX 1650 Ti GPU. The detection
model was implemented using CUDA, Ultralytics,
OpenCV, and PyTorch. Table 2 provides the
computer configurations.
ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,
COMMUNICATION, AND COMPUTING TECHNOLOGIES
348
Table 2: Computer Configuration.
CONFIGURATION TYPE
System Windows 11
CPU
AMD Ryzen 7
4800HS (8 cores, 16
threads)
Memory 16GB
GPU
NVIDIA GeForce
GTX 1650 Ti
OS type 64-bit
Python version 3.12.7
N
VIDIA Drive
r
version 555.97
CUDA version 12.6
Ultralytics version 8.3.78
PyTorch version 2.6.0+cu126
OpenCV version 4.11.0
4.2 Dataset
The dataset consists of 717 images, divided into the
following parts: the train part (73%) contains 521
images, the validation part (16%) contains 114
images, and the test part (11%) contains 82 images.
Each set includes images and corresponding label
files (.txt). The dataset contains 10 labels, numbered
0 to 9: Hardhat, Machinery, NO-Hardhat, Mask,
Safety Cone, NO-Mask, Safety Vest, Vehicle, NO-
Safety Vest, and Person. The train set is modified
using mosaic data augmentation, and each model is
trained on that set (2025). The model is then
evaluated on a modified set under different
conditions, including RGB, grayscale, blur, dust, and
maximum brightness, ensuring evaluation across all
scenarios. Figure 3 presents the graphical results of
the train set.
Figure 3: Graphical Results of the Train Set.
4.3 Evaluation Metrics
4.3.1 Mean Average Precision (mAP)
mAP determines the overall detection efficiency by
averaging the precision scores across multiple recall
thresholds. It is widely used in detection to assess the
accuracy of bounding box predictions (2025). For
PPE detection, a greater mAP indicates correctly
identifying helmets, vests, gloves, and other safety
gear with high recall and precision. This is useful in
ensuring compliance monitoring in real-time
environments.
mAP =
AP

(1)
4.3.2 Intersection over Union
IoU evaluates the intersection of the ground truth with
the predicted bounding box. A high IoU is better for
object localization, which is critical for PPE detection
in construction sites. Poor IoU may result in
misclassification or failure to detect safety gear,
leading to compliance issues. It is commonly used to
filter out incorrect detections in object detection
tasks.
IoU =
  
  
(2)
where the numerator is the intersection of actual and
predicted bounding boxes, and the denominator is
their combined area.
4.3.3 Precision and Recall
Precision measures how many detected PPE items are
correct, while recall measures the actual PPE items
were successfully identified. High precision results in
lower false alarms, while high recall results in lower
missed detections. In PPE detection, striking a
balance is crucial for reliable compliance monitoring
in workplaces.
Precision =


(3)
Recall =


(4)
4.3.4 Accuracy
Accuracy is a fundamental metric used to determine
the overall efficiency of a PPE detection system. It
measures the correctly classified instances (PPE and
non-PPE) from the total number of predictions
(2025). A greater accuracy indicates correctly
identifying workers wearing PPE and those without
it, minimizing misclassifications. However, accuracy
Enhancing Construction Site Safety: Personal Protective Equipment Detection Using Yolov11 and OpenCV
349
alone may not always reflect model reliability,
especially if there is an imbalance in the dataset (e.g.,
more PPE-wearing workers than non-compliant
ones). To get a clearer picture, accuracy is often
analyzed alongside precision and recall to ensure a
balanced performance.
Accuracy =


(5)
4.3.5 F1-Score
It is the harmonic mean of recall and precision. It
provides a metric to determine the overall efficiency
of the PPE detection model. It is particularly useful
when there is an imbalance between detected and
actual PPE items, ensuring that false negatives and
false positives are minimized.
F1 = 2 ×
 ×

(6)
A higher F1-score indicates a well-balanced
model that effectively detects safety gear while
minimizing incorrect detections.
5 RESULTS AND DISCUSSION
5.1 Model Training
We trained the model using the same training set as
the default YOLOv11 detection models. We
conducted the training on 640 × 640 pixels images for
100 iterations. The accuracy improved as the no. of
epochs increased. On the other hand, the box_loss,
dfl_loss, and cls_loss gradually decreased to 0.75,
1.04, and 0.56, respectively. Table 4 shows the
Comparison of existing object detection models with
the proposed model. Figure 4 depicts a decline in loss
for both training and validation. Figure 6 (b) shows
the precision-recall curve used to analyze mAP. The
mAP@0.5 stabilized at approximately 0.771 after 58
iterations and reached 0.810 after 100 iterations,
while mAP@0.5-0.95 progressively increased to
0.603. Moreover, Figure 6 (a), (c), and (d) illustrate
the F1-score, precision, and recall curves in relation
to confidence levels for each class. Accuracy,
Precision, Recall, and F1-Score of each class detected
by the proposed model Shown in Table 3.
Table 3: Accuracy, Precision, Recall, and F1-Score of Each Class Detected by the Proposed Model.
Class Accuracy Precision Recall F1-Score
Hardhat 0.970883 0.904762 0.760000 0.826087
Mask 0.989081 0.978261 0.900000 0.937500
NO-Hardhat 0.960874 0.911765 0.626263 0.742515
NO-Mask 0.957234 0.835443 0.660000 0.737430
NO-Safety Vest 0.955414 0.786517 0.700000 0.740741
Person 0.953594 0.720721 0.800000 0.758294
Safety Cone 0.985441 0.928571 0.910000 0.919192
Safety Vest 0.975432 0.939759 0.780000 0.852459
Machinery 0.985441 0.911765 0.930000 0.920792
Vehicle 0.947225 0.791667 0.570000 0.662791
Table 4: Comparison of Existing Object Detection Models with the Proposed Model.
Model
size
(MB)
mAP
@0.5(%)
mAP
@0.5:0.95(%)
Parameters
(MB)
FLOPs
(GB)
Speed
(ms)
YOLO11n 5.4 58.0 39.5 2.6 6.5 1.5
YOLO11s 18.4 70.1 47.0 9.4 21.6 2.5
YOLO11m 38.8 73.3 51.5 20.1 68.0 4.7
YOLO11l 49.0 77.3 53.4 25.3 86.9 6.2
Our Model 10.5 81.0 60.3 6.4 14.7 2.4
ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,
COMMUNICATION, AND COMPUTING TECHNOLOGIES
350
Figure 4: Training and Validation Loss Curves.
Figure 5: Detection Results Before and After Model
Analysis.
Figure 6: Evaluation Curves: (A) F1-Score Vs. Confidence;
(B) Precision Vs. Recall; (C) Precision Vs. Confidence; (D)
Recall Vs. Confidence.
Figure 7: Confusion Matrix.
5.2 Testing and Validation
After training, the testing and validation part is where
we evaluate the model's efficiency. Fig. 7 shows the
confusion matrix generated by comparing the model's
results with the image labels from the test set. The
accuracy, precision, recall, and F1-Score were then
computed for each class, as shown in Table 3.
We further validated the model by testing it on the
following scenarios: (1) RGB, (2) gray-tone, (3) Blur
effect, (4) Dust effect, (5) High brightness, and (6)
Real-time Videos. Fig. 5 depicts the before and after
for the first five conditions. For RGB and grayscale
testing, we first evaluated the images in their original
RGB format and then again in grayscale. For
maximum brightness, dusting, and blur effects, the
same set of images was processed under extreme
brightness and with added noise to simulate real-
world conditions such as strong sunlight or dusty
environments. In the case of the blur effect, images
were blurred up to 30% to mimic humid weather
conditions.
Apart from image analysis, the model is also
capable of processing videos by capturing and
analyzing frames in real-time. It supports multiple
video formats, including MP4, AVI, 3GP, WMV,
MOV, FLV, MKV, WEBM, HTML5, AVCHD,
MPEG-2, and MPEG-1. We can analyze CCTV
footage for PPE detection and worker-machinery
proximity monitoring in real time. The model can
process videos ranging from 144p to 2160p without
noticeable delays and supports frame rates of 30 FPS,
60 FPS, and beyond Alvarez et al., (2023).
Enhancing Construction Site Safety: Personal Protective Equipment Detection Using Yolov11 and OpenCV
351
5.3 Comparison of Detection
Algorithms
In this study, we selected various sizes of YOLOv11
detection models for comparative experiments.
Training the model using the same training set as the
default YOLOv11 detection models rather than using
transfer learning, because pre-trained models used
different datasets for training. The primary
comparison parameters include model size,
processing speed, mAP values at 0.5 and 0.5-0.95
IoU, FLOPs, and the number of parameters. Table 4
presents the comparison results. We can see in Table
4 that our model performs better than other
YOLOv11 models. Achieving a 3.7% increase in
mAP at 0.5 IoU as well as 6.9% increase in mAP
going from 0.5 to 0.95 IoU than YOLOv11l while
maintaining size and speed between YOLOv11n and
YOLOv11s. With a higher mAP and lower size and
speed, it is computationally efficient compared to
other models, making the model applicable for real-
time analysis Azatbekuly et al., (2024).
5.4 Research Novelty
As mentioned earlier, multiple researchers have tried
to automate construction site safety protocols using
AI. Most studies focus on ML and CV techniques for
safety helmet detection. Some research extends to
detecting masks and vehicles alongside safety
helmets using computer vision. However, none of
these studies have performed PPE detection along
with worker-machinery proximity monitoring. The
proximity detection algorithm is a crucial component
that, when integrated with PPE detection, could
enhance workplace safety and prevent accidents.
The research achieved an overall accuracy of 97%
and an overall precision of 87% in detecting PPEs.
The results achieved are better than those of the
previous studies and show slight improvements over
existing models. Furthermore, the model is trained to
detect objects under various weather conditions,
including sunny, humid, and dusty environments. In
contrast, most previous studies used only RGB
images for training and testing. Our model, however,
accurately detects all necessary PPEs, machinery,
vehicles, and worker-machinery proximity with high
precision across different weather conditions Sridhar
et al., (2024).
6 CONCLUSIONS
This paper presents a fast and accurate model for
detecting PPEs worn by workers and issuing alerts in
case of non-compliance. The model detects objects in
images with an inference speed of 2.4 seconds and
outperforms YOLOv11l with a 3.7% increase in mAP
at 0.5 IoU as well as 6.9% increase in mAP going
from 0.5 to 0.95 IoU. To improve detection accuracy,
mosaic data augmentation was included during
training, allowing the model to detect small-scale
objects effectively. Furthermore, the model can also
track worker movements near machinery using
bounding box regression and issue alerts if unsafe
proximity is detected. We also trained the model to
function under different weather conditions. The
model achieves a mAP of 81% at 0.5 IoU and a mAP
of 60.3% at 0.5 to 0.95 IoU, with an overall accuracy,
precision, and recall of 97%, 87%, and 76%,
respectively. It is computationally efficient, with
14.7GB FLOPs, 6.4MB parameters, and an inference
speed of 2.4ms, making the model applicable for real-
time analysis.
REFERENCES
Al-Azani, S., Luqman, H., Alfarraj, M., Sidig, A. A. I.,
Khan, A. H., Al-Hamed, D.: Real-Time Monitoring of
Personal Protective Equipment Compliance in
Surveillance Cameras. IEEE Access, 12, 121882–
121895 (2024).
https://doi.org/10.1109/ACCESS.2024.3451117
Alvarez, M. R., Vega, C. Q., Wong, L.: Model for
Recognition of Personal Protective Equipment in
Construction Applying YOLO-v5 and YOLO-v7. In:
2023 International Conference on Electrical, Computer
and Energy Technologies - ICECET, pp. 1–6. IEEE,
Cape Town (2023). https://doi.org/10.1109/ ICECET
58911.2023.10389215
Azatbekuly, N., Mukhanbet, A., Bekele, S. D.:
Development of an Intelligent Video Surveillance
System Based on YOLO Algorithm. In: 2024 IEEE 4th
International Conference on Smart Information
Systems and Technologies - SIST, pp. 498–503. IEEE,
Astana (2024). https://doi.org/10.1109/ .2024.
10629617
Biswas, M., Hoque, R.: Construction Site Risk Reduction
via YOLOv8: Detection of PPE, Masks, and Heavy
Vehicles. In: 2024 IEEE International Conference on
Computing, Applications and Systems - COMPAS, pp.
1–6. IEEE, Cox’s Bazar (2024).
https://doi.org/10.1109/COMPAS60761.2024.107962
53
Chen, B., Wang, X., Huang, G., Li, G.: Detection of
Violations in Construction Site Based on YOLO
ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,
COMMUNICATION, AND COMPUTING TECHNOLOGIES
352
Algorithm. In: 2021 2nd International Conference on
Artificial Intelligence and Computer Engineering -
ICAICE, pp. 251–255. IEEE, Hangzhou (2021).
https://doi.org/10.1109/ICAICE54393.2021.00057
Dermatology 2.0: Deploying YOLOv11 for Accurate and
Accessible Skin Disease Detection: A Web-Based
Approach. Scientific Figure on ResearchGate (2024).
https://www.researchgate.net/figure/Shows-the-
architecture-of-YoloV11_fig2_389021414
Feng, R., Miao, Y., Zheng, J.: A YOLO-Based Intelligent
Detection Algorithm for Risk Assessment of
Construction Sites. J. Intell. Constr., 2(4), 1–18 (2024).
https://doi.org/10.26599/Jic.2024.9180037
Gautam, V., Maheshwari, H., Tiwari, R. G., Agarwal, A.
K., Trivedi, N. K., Garg, N.: Feature Fusion-Based
Deep Learning Model to Ensure Worker Safety at
Construction Sites. In: 2024 1st International
Conference on Advanced Computing and Emerging
Technologies - ACET, pp. 16. IEEE, Ghaziabad
(2024).https://doi.org/10.1109/ACET61898.2024.1073
0698
Han, C.; Zhang, J.; Wu, H.: Fall Detection System Based
on YOLO Algorithm and MobileNetV2 Model. In: 10th
International Conference on Systems and Informatics
(ICSAI), pp. 1–5. IEEE, Shanghai
(2024).https://doi.org/10.1109/ICSAI65059.2024.1089
3853
Han, K., Zeng, X.: Deep Learning-Based Workers Safety
Helmet Wearing Detection on Construction Sites Using
Multi-Scale Features. IEEE Access, 10, 718–729
(2022).
https://doi.org/10.1109/ACCESS.2021.3138407
Jankovic, P., Protić, M., Jovanovic, L., Bacanin, N.,
Zivkovic, M., Kaljevic, J.: YOLOv8 Utilization in
Occupational Health and Safety. In: 2024 Zooming
Innovation in Consumer Technologies Conference -
ZINC, pp. 182–187. IEEE, Novi Sad
(2024).https://doi.org/10.1109/ZINC61849.2024.1057
9310
Krishna, N. M., Reddy, R. Y., Reddy, M. S. C., Madhav, K.
P., Sudham, G.: Object Detection and Tracking Using
YOLO. In: 2021 Third International Conference on
Inventive Research in Computing Applications -
ICIRCA, pp. 1–7. IEEE, Coimbatore
(2021).https://doi.org/10.1109/ICIRCA51532.2021.95
44598
Li, Z., Guan, S.: Efficient-YOLO: A Research on
Lightweight Safety Equipment Detection Based on
Improved YOLOv8. In: 2024 2nd International
Conference on Artificial Intelligence and Automation
Control - AIAC, pp. 246–249. IEEE, Guangzhou
(2024).https://doi.org/10.1109/AIAC63745.2024.1089
9732
Lin, B.: Safety Helmet Detection Based on Improved
YOLOv8. IEEE Access, 12, 28260–28272 (2024).
https://doi.org/10.1109/ACCESS.2024.3368161
M, L., J, R., M, A.: Real-time Hazard Detection System for
Construction Safety Using Hybrid YOLO-ViTs. In:
2024 International Conference on Emerging Research
in Computational Science - ICERCS, pp. 1–5. IEEE,
Coimbatore (2024). https://doi.org/10.1109/ ICERCS
63125.2024.10895680
Mahmud, S. S., Islam, M. A., Ritu, K. J., Hasan, M.,
Kobayashi, Y., Mohibullah, M.: Safety Helmet
Detection of Workers in Construction Site Using
YOLOv8. In: 2023 26th International Conference on
Computer and Information Technology - ICCIT, pp. 1–
6. IEEE, Cox’s Bazar (2023).
https://doi.org/10.1109/ICCIT60459.2023.10441212
Menon, S. M., George, A., N, A., James, J.: Custom Face
Recognition Using YOLO.V3. In: 2021 3rd
International Conference on Signal Processing and
Communication - ICSPC, pp. 454–458. IEEE,
Coimbatore (2021).
https://doi.org/10.1109/ICSPC51351.2021.9451684
N, S., Sridhar, S., Sudhir, S., Adithan, V. M., Vignesh, G.:
Safefaceyolo: Advanced Workplace Security Through
Helmet Detection and Facial Authorization. In: 2024
2nd International Conference on Artificial Intelligence
and Machine Learning Applications - AIMLA, pp. 1–5.
IEEE, Namakkal (2024). https://doi.org/ 10.1109/
AIMLA59606.2024.10531484
Ponika, M., Jahnavi, K., Sridhar, P. S. V. S., Veena, K.:
Developing a YOLO-Based Object Detection
Application Using OpenCV. In: 2023 7th International
Conference on Computing Methodologies and
Communication - ICCMC, pp. 662–668. IEEE, Erode
(2023). https://doi.org/10.1109/ ICCMC56507.
2023.10084075
Roboflow Universe Projects: Construction Site Safety
Dataset. Roboflow Universe (2024). Accessed on: Feb.
26, 2025. https://universe.roboflow.com/roboflow-
universe-projects/construction-site-safety
Shanti, M. Z., Cho, C.-S., Byon, Y.-J., Yeun, C. Y., Kim,
T.-Y., Kim, S.-K.: A Novel Implementation of an AI-
Based Smart Construction Safety Inspection Protocol in
the UAE. IEEE Access, 9, 166603–166616 (2021).
https://doi.org/10.1109/ACCESS.2021.3135662
Shetty, N. P., Himakar, J., Gnanchandan, P., Prajwal, V.,
Jamadagni, S. S.: Enhancing Construction Site Safety:
A Tripartite Analysis of Safety Violations. In: 2024 3rd
International Conference for Innovation in Technology
- INOCON, pp. 1–6. IEEE, Bangalore
(2024).https://doi.org/10.1109/INOCON60754.2024.1
0511598
Enhancing Construction Site Safety: Personal Protective Equipment Detection Using Yolov11 and OpenCV
353