A Comparative Analysis of Pedestrian Detection Performance for

YOLOv8 Models on the BDD100K Dataset

Yaxiaer Yashengjiang

School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China

Keywords: YOLOv8, Object Detection, Autonomous Driving, Pedestrian Detection, BDD100K.

Abstract: Efficient and reliable pedestrian detection is paramount for autonomous driving safety. This paper presents a

systematic comparison of YOLOv8 models of varying scales (s, m, l) in complex driving scenarios, focusing

on performance trade-offs. The models were trained and evaluated on the "pedestrian" class of the BDD100K

dataset, covering core metrics like mean Average Precision (mAP), precision, recall, model parameters, and

inference speed. Results show a positive correlation between model scale and detection performance.

YOLOv8l achieved the highest accuracy (mAP50 of 0.683) and demonstrated superior robustness, especially

in challenging nighttime, rainy conditions, and for small object detection. However, this came at the cost of

the slowest inference speed. Conversely, YOLOv8s offered the fastest inference but compromised on

accuracy. This quantitative analysis reveals the inherent trade-offs between accuracy, efficiency, and

robustness among YOLOv8 models, providing empirical data to guide model selection for autonomous

systems based on specific hardware and performance needs.

1 INTRODUCTION

Autonomous vehicles are transitioning from concept

to industrialization, regarded as a key technology

poised to reshape future transportation paradigms

(Badue et al., 2021). The core capability of an

autonomous driving system lies in its precise

perception and understanding of complex, dynamic

driving environments (Yurtsever et al., 2020). Within

this context, object detection technology is tasked

with identifying critical elements on the road and has

been a focus of research for decades (Zou et al.,

2023). Therefore, high-performance and reliable

object detection serves as the cornerstone for the

proper functioning of subsequent decision-making

and planning modules, and indeed, for the safety of

the entire autonomous driving system. However, the

open and variable nature of real-world road scenes

poses significant challenges to detection algorithms.

Factors such as varying lighting conditions, adverse

weather, multi-scale targets, partial occlusions, and

the high demand for real-time processing place

stringent requirements on both the accuracy and

efficiency of detection algorithms.

https://orcid.org/0009-0007-8305-4117

In recent years, advancements in deep learning,

particularly in Convolutional Neural Networks

(CNNs) and Transformer architectures (Vaswani et

al., 2017), have propelled the development of object

detection technology. Researchers have made

substantial efforts to enhance model performance and

adapt them to specific application scenarios. For

instance, a study targeting Advanced Driver

Assistance Systems (ADAS) (Parekh, 2025)

evaluated the performance of the CNN-based

YOLOv8 model family against the Transformer-

based real-time detection model RT-DETR. The

findings indicated that while RT-DETR excelled on

the general-purpose COCO dataset, YOLOv8 series

models often achieved higher mAP and lower

inference latency in specific ADAS-related tasks.

Furthermore, another study (Mishra et al., 2024)

demonstrated the adaptability and optimization

potential of the YOLOv8 model in specific

challenging environments by integrating an image

enhancement algorithm to improve performance on

underwater imagery.

Existing research has validated the baseline

capabilities and application prospects of YOLOv8.

However, a systematic, detailed, and multi-

222

Yashengjiang, Y.

A Comparative Analysis of Pedestrian Detection Performance for YOLOv8 Models on the BDD100K Dataset.

DOI: 10.5220/0014325600004718

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 2nd International Conference on Engineering Management, Information Technology and Intelligence (EMITI 2025), pages 222-226

ISBN: 978-989-758-792-4

dimensional performance comparison and analysis of

the different-scale models within the YOLOv8 series

itself—specifically for detecting the critical

"pedestrian" target on a challenging, large-scale

public autonomous driving dataset—remains an area

worthy of investigation. Such an analysis is crucial

for understanding the specific trade-offs among

models of different scales in terms of accuracy, speed,

resource consumption, and sensitivity to complex

driving environment factors.

Therefore, the core work of this study is to select

three representative models from the YOLOv8

series—small, medium, and large—and conduct a

unified training, comprehensive evaluation, and in-

depth comparative study focused on the "pedestrian"

class within the BDD100K dataset. This research

aims to reveal the specific differences and inherent

trade-offs in accuracy, efficiency, robustness, and

resource consumption among these models, thereby

providing insights for future targeted optimizations.

2 DATASET AND METHODS

2.1 BDD100K Dataset and Processing

This study utilizes the large-scale driving scene object

detection dataset, Berkeley DeepDrive 100K

(BDD100K) (Yu et al., 2020). This dataset is

renowned for its vast data volume and high degree of

scene diversity, making it an ideal choice for

evaluating the comprehensive performance of

autonomous driving perception algorithms. The

annotation information in BDD100K includes 10

main categories of traffic participants, such as

"pedestrian" and "car".

The experiment strictly adheres to the official data

partitioning of BDD100K, using its training set of

approximately 70,000 images for model training and

its validation set of about 10,000 images for

performance evaluation. To align with the core

objective of this study—pedestrian detection—the

dataset was specifically processed. First, the original

annotations were converted to the YOLO standard

format. Second, all images containing the "person"

class and their corresponding pedestrian annotations

were filtered to construct a single-class data subset for

the pedestrian detection task.

2.2 The YOLOv8 Model Family

This study employs the YOLOv8 series of models as

the core research object (Jocher et al., 2023).

YOLOv8 inherits and evolves the single-stage, end-

to-end detection philosophy of the YOLO algorithm

family (Redmon et al., 2016), which fundamentally

changed real-time detection. The series has

continuously evolved with versions like YOLOv7

and YOLOv9 (Wang, Bochkovskiy, & Liao, 2023;

Wang, Yeh, & Liao, 2024). By introducing more

efficient structural components and adopting an

Anchor-Free detection head design, a trend also seen

in other modern detectors (Carion et al., 2020), it

achieves significant improvements. Its typical three-

part architecture—Backbone, Neck, and Head—

ensures powerful feature extraction, fusion, and

prediction capabilities, a common paradigm of

inefficient detectors (Tan, Pang, & Le, 2020).

To comprehensively assess the impact of model

scale on pedestrian detection performance, this study

selected three representative, pre-defined standard

models from the YOLOv8 series: YOLOv8s (small),

YOLOv8m (medium), and YOLOv8l (large). These

three models were sourced directly from the official

repository and utilized their corresponding pre-

trained weights from the COCO dataset for transfer

learning. No modifications were made to their

standard network architectures during the

experiment.

2.3 Experimental Setup

All model training and evaluation were conducted in

a unified experimental environment. The platform

was based on an Apple M3 Max processor with its

integrated GPU, accelerated using Metal

Performance Shaders (MPS), under the macOS

operating system. The key software environment

included Python 3.10, PyTorch 2.8.0.dev, and

Ultralytics YOLO 8.3.139.

During the training phase, all models were

initialized with COCO pre-trained weights and

trained for 10 epochs on the BDD100K pedestrian

subset. The optimizer used was SGD. The initial

learning rate and batch size were adapted for the

different model scales: YOLOv8s used an initial

learning rate of 0.001 and a batch size of 8; YOLOv8l

used an initial learning rate of 0.0005 and a batch size

of 4. The input image size for both training and

validation was uniformly set to 640x640 pixels. For

data augmentation, a series of methods provided by

the YOLOv8 framework were used, primarily

including Mosaic, random horizontal flip, random

scaling, random translation, and HSV color space

augmentation.

A Comparative Analysis of Pedestrian Detection Performance for YOLOv8 Models on the BDD100K Dataset

223

3 EXPERIMENTAL RESULTS

AND ANALYSIS

3.1 Overall Model Performance and

Efficiency Trade-Offs

The detection accuracy of a model under standard

metrics and its operational efficiency on a specific

hardware platform are critical determinants of its

practical value. The analysis revealed a clear trade-

off between model size, accuracy, and speed. As

illustrated in Figure 1, YOLOv8l, with its largest

model scale (43.7M parameters), demonstrated the

best performance, achieving a mAP50 of 0.6828 and

significantly outperforming the other models. This

indicates that for applications pursuing the highest

accuracy, YOLOv8l is the top choice. However, this

superior performance is accompanied by the slowest

inference speed at approximately 26 FPS. In contrast,

the lightweight, YOLOv8 (11.2M parameters)

showcased the best efficiency, achieving the highest

speed of about 66 FPS, though with a lower mAP50

of 0.5344. YOLOv8m offered a middle ground at

approximately 40 FPS. This clear trade-off

relationship is a core issue that system designers must

address during model selection.

Figure 1: Comparison of YOLO models on key performance metrics (Picture credit: Original).

3.2 Robustness Analysis in Different

Driving Scenarios

By comparing the mAP50 of each model in five

typical driving environments—daytime, nighttime,

crowded, foggy, and rainy—it was found that all

models achieved their respective peak performance in

the "daytime" scene. However, in challenging

scenarios, the performance of all models declined. In

this context, YOLOv8l once again demonstrated its

robustness, maintaining the highest mAP50 under all

tested conditions, with its performance advantage

being particularly pronounced in the "nighttime" and

"rainy" scenes. The absolute mAP50 values for

YOLOv8s and YOLOv8m under various challenging

conditions were generally lower, indicating a more

significant performance degradation in adverse

environments, a well-known challenge in the field

(Li, Chen, & I., 2022).

EMITI 2025 - International Conference on Engineering Management, Information Technology and Intelligence

224

In autonomous driving scenarios, pedestrian

targets can exhibit drastic size variations. Accurate

detection of small, distant targets is crucial for early

warning and collision avoidance, and it represents a

major technical challenge for object detection

algorithms (Li et al., 2022).

Figure 2: Analysis of mAP50 by object size (Picture credit: Original).

Analysis reveals (Figure 2) that all models

perform better on large-sized targets than on medium

and small ones. However, the performance gap

between models is particularly prominent in the most

challenging task of small object detection: YOLOv8l

achieved a mAP50 of approximately 0.38. Compared

to YOLOv8s, YOLOv8l's mAP50 improvement in

small object detection was a staggering 52.0%. This

substantial improvement strongly proves that

YOLOv8l, with its deeper and wider network

structure, can learn more discriminative and subtle

features, thus exhibiting a decisive advantage in small

object detection.

4 QUALITATIVE

VISUALIZATION OF

DETECTION RESULTS

To more intuitively demonstrate the actual pedestrian

detection effects and to supplement the preceding

quantitative analysis, this section presents and

analyzes visualizations from several representative

test images (Figure 3, Figure 4, Figure 5).

Figure 3: Detection comparison in a typical nighttime urban

scene (Picture credit: Original).

In typical scenes with adequate lighting,

YOLOv8s (green boxes), YOLOv8m (orange boxes),

and YOLOv8l (red boxes) all consistently detected

the pedestrian targets, indicating that all three models

possess a fundamental capability for pedestrian recall.

Figure 4: Detection comparison in a challenging scene with

mixed lighting conditions (Picture credit: Original).

In a more challenging lighting scenario, both

YOLOv8s and YOLOv8m failed to detect a distant

pedestrian, resulting in a missed detection. However,

YOLOv8l, the largest and most complex model,

managed to identify this target. This comparison

significantly illustrates that increasing the scale and

depth of the model's network structure enhances its

ability to capture and recognize weak feature signals

and low-contrast targets.

Figure 5: Performance difference in a high-density

pedestrian crossing scene (Picture credit: Original).

A Comparative Analysis of Pedestrian Detection Performance for YOLOv8 Models on the BDD100K Dataset

225

In a high-density pedestrian scene, the models

showed clear differences. YOLOv8s detected the

highest number of pedestrians with high confidence.

YOLOv8m detected a moderate number with slightly

lower confidence. YOLOv8l, while detecting many

targets, produced some low-confidence boxes, which

highlights the importance of post-processing

parameter tuning in practical applications.

5 CONCLUSIONS

This study conducted a systematic performance

evaluation and comparative analysis of three

different-scale models from the YOLOv8 series on

the critical task of pedestrian detection using the

BDD100K dataset. The core objective was to reveal

the inherent trade-offs between detection accuracy,

operational efficiency, and adaptability to complex

environments.

The experimental results clearly indicate a

positive correlation between model scale and

detection performance. YOLOv8l, with its complex

network architecture, significantly outperformed the

other models across all core accuracy metrics

(achieving a mAP50 of 0.683) and demonstrated

superior robustness. However, this performance

enhancement comes at the cost of higher

computational resource consumption and slower

inference speed. YOLOv8s excelled in inference

efficiency but showed deficiencies in accuracy.

YOLOv8m offered an intermediate option.

The limitations of this study include that all

models were trained for only 10 epochs. Future

research could explore longer training periods,

hyperparameter tuning, and model optimization

techniques like quantization and pruning.

In conclusion, this study provides a quantitative

basis and practical reference for the autonomous

driving field to rationally select and deploy object

detection models according to specific hardware

platforms, real-time requirements, and reliability

needs.

REFERENCES

Badue, C., et al. (2021). Self-driving cars: A survey. Expert

Systems with Applications, 165, 113816.

Carion, N., et al. (2020). End-to-end object detection with

transformers. In European Conference on Computer

Vision.

Jocher, G., et al. (2023). YOLO by Ultralytics. GitHub

repository. https://github.com/ultralytics/yolov5

Li, J., et al. (2022). A survey of deep learning for small

object detection. Foundations and Trends® in

Computer Graphics and Vision, 13(3), 189–311.

Li, Y., Chen, I-K., & I. (2022). A review on deep learning-

based object detection and tracking in adverse weather.

IEEE Transactions on Intelligent Transportation

Systems, 23(7), 6046–6065.

Mishra, S., et al. (2024). Fine-tuning YOLOv8 for

enhanced accuracy in marine environments. In

TENCON 2024 – 2024 IEEE Region 10 Conference

(TENCON).

Parekh, A. (2025). Comparative analysis of YOLOv8 and

RT-DETR for real-time object detection in advanced

driver assistance systems.

Redmon, J., et al. (2016). You only look once: Unified, real-

time object detection. In Proceedings of the IEEE

Conference on Computer Vision and Pattern

Recognition.

Tan, M., Pang, R., & Le, Q. V. (2020). EfficientDet:

Scalable and efficient object detection. In Proceedings

of the IEEE/CVF Conference on Computer Vision and

Pattern Recognition.

Vaswani, A., et al. (2017). Attention is all you need. In

Advances in Neural Information Processing Systems.

Wang, C. Y., Bochkovskiy, A., & Liao, H. Y. M. (2023).

YOLOv7: Trainable bag-of-freebies sets new state-of-

the-art for real-time object detectors. In Proceedings of

the IEEE/CVF Conference on Computer Vision and

Pattern Recognition.

Wang, C. Y., Yeh, I. H., & Liao, H. Y. M. (2024). YOLOv9:

Learning what you want to learn using programmable

gradient information. arXiv preprint arXiv:2402.13616.

Yu, F., et al. (2020). BDD100K: A diverse driving dataset

for heterogeneous multitask learning. In Proceedings of

the IEEE/CVF Conference on Computer Vision and

Pattern Recognition.

Yurtsever, E., et al. (2020). A survey of deep learning for

autonomous vehicle perception. IEEE Transactions on

Intelligent Transportation Systems, 22(1), 3–32.

Zou, Z., et al. (2023). Object detection in 20 years: A survey.

arXiv preprint arXiv:1905.05055v2.

EMITI 2025 - International Conference on Engineering Management, Information Technology and Intelligence

226