
limitations in existing research related to dataset di-
versity and size. We finetuned the YOLOv5 Medium
(YOLOv5m) model. It is a single stage object detec-
tion model known for its balance of speed, accuracy,
and lightweight architecture-on this dataset and ex-
tended the detection pipeline to incorporate explain-
ability via Grad-CAM, modifying the model’s final
layers to enable attention-based visualization of pre-
dicted regions.
Our evaluation examines the model’s perfor-
mance across multiple fracture types within the same
anatomical region, assessing both detection accuracy
and generalizability. By offering a robust evaluation
framework, our research contributes to the develop-
ment of more reliable, effective and transparent auto-
mated diagnostic systems, paving the way for future
advancements in fracture detection and medical imag-
ing.
This paper is organized into seven sections. Sec-
tion 1 provides an overview of the research problem
and objectives. Section 2 situates the study within ex-
isting literature. The Section 3 outlines the data used,
followed by the Methodology in Section 4 which de-
tails the proposed approach. System Requirements
and Evaluation Metrics are specified in Section 5. The
Section 6 discusses the findings, and the Conclusion
section highlights key insights and future directions.
2 RELATED WORK
In recent years, the application of machine learn-
ing (ML) and DL techniques have significantly ad-
vanced the field of automated bone fracture detec-
tion (Ahmed and Hawezi, 2023). Zhang et al.
(2021) proposed a traditional ML pipeline involving
grayscale conversion, Gaussian filtering, adaptive his-
togram equalization, Canny edge detection, and Gray-
Level Co-occurrence Matrix (GLCM)-based feature
extraction, with classification using models like Sup-
port Vector Machine (SVM) achieving up to 92% ac-
curacy. Addressing annotation ambiguity, a point-
based annotation with “Window Loss,” was intro-
duced achieving an Area Under the Receiver Operat-
ing Characteristic curve (AUROC) of 0.983 and Free-
Response Receiver Operating Characteristic (FROC)
of 89.6%, outperforming standard detectors.
Building on traditional ML, several studies
demonstrated the superior capability of DL models
in capturing complex patterns. Karanam et al. (2021)
emphasized the effectiveness of Convolutional Neural
Networks (CNN) for hierarchical feature learning, es-
pecially in large datasets. Ghosh et al. (2024) further
improved accuracy (97%) by applying anatomical
feature enhancement before feeding the images into
CNNs. Lee et al. (2020) proposed a meta-learning-
based encoder-decoder using GoogLeNet, utilizing
shared latent representation for improved classifica-
tion across modalities.
Hybrid and transfer learning strategies have also
shown significant promise. Khatik and Kadam (2022)
and Warin et al. (2023) explored the use of pretrained
models such as ResNet and Faster R-CNN, integrat-
ing transfer learning and data augmentation to en-
hance performance. Meena and Roy (2022) demon-
strated the integration of real-time DL models like
ResNet, VGGNet, and U-Net, achieving high accu-
racy for wrist and hip fractures while highlighting
challenges such as class imbalance and rare case de-
tection.
Fracture localization has become increasingly im-
portant. Ma (2021) proposed a two-stage framework
combining Faster R-CNN with a Crack-Sensitive
CNN (CrackNet) for detecting and classifying spe-
cific bone regions. Similar detection-refinement
pipelines were explored by Abbas et al. (2020) and
Su et al. (2023), reporting mAP scores around 60%.
One-shot detectors such as the YOLO fam-
ily have gained substantial traction for their speed
and efficiency. Zou and Arshad (2024) demon-
strated YOLO’s effectiveness over two-stage detec-
tors. Ju and Cai (2023) showcased YOLOv8’s per-
formance, achieving a mAP of 0.638 using multi-
scale feature fusion.Morita et al. (2024) confirmed
YOLOv8’s superiority over SSD after extended train-
ing. The YOLOv7-ATT model by Zou and Arshad
(2024), with an attention mechanism, achieved 86.2%
mAP on the FracAtlas dataset by focusing on sub-
tle fracture-specific cues. Moon et al. (2022) used
YOLOX-S for nasal bone fractures, achieving 100%
sensitivity and 69.8% precision, thereby easing diag-
nostic burden for specialists.
Beyond YOLO, other architectures have been
tested. AFFNet, as proposed by Nguyen et al. (2024),
improved upon ResNet-50 while integrating activa-
tion maps to visualize important regions. While Reti-
naNet lagged behind with approximately 76% accu-
racy, Yadav et al. (2022) introduced SFNet—using
multi-scale fusion and edge detection—to achieve
99.12% accuracy, 100% precision, and 98% recall,
outperforming U-Net, YOLOv4, and R-CNN. In ad-
dition, Beyraghi et al. (2023) explored microwave
imaging as a novel, radiation-free method for fracture
detection using S-parameter data and deep neural net-
works, achieving high classification accuracy and low
regression error.
Parallel to the advancements in detection architec-
tures, the role of XAI has grown critical in ensuring
KDIR 2025 - 17th International Conference on Knowledge Discovery and Information Retrieval
40