Trafﬁc Flow Detection and Prediction Using YOLO11: A

Threshold-Based Approach for Identifying Trafﬁc Levels

Dheeraj Maladkar, E Aditya Sai Krishna, Satwik Nayak, Swaroop N Udasi and Channabasappa Muttal

School of Computer Science and Engineering,

KLE Technological University, Hubballi, India

Keywords:

YOLO11, Trafﬁc Detection, Threshold, Trafﬁc Management, C3K2, SPPF, C2PSA.

Abstract:

Trafﬁc congestion is a major problem in cities, making it important to have better systems to detect and predict

trafﬁc for better trafﬁc management. This paper gives an insight about trafﬁc ﬂow detection and prediction

model utilizing YOLO11, an updated and advanced object detection model. The primary objective of this

research is to classify trafﬁc levels based on the threshold set, serving a systematic approach for trafﬁc moni-

toring. The system uses the number of vehicles detected on each image to classify it as a heavy trafﬁc or light

trafﬁc based on the pre-deﬁned threshold. This approach signiﬁes an efﬁcient and automated model to assess

the trafﬁc conditions, which can be applied in the developing cities and further help in trafﬁc management and

intelligent transportation system. YOLO11 incorporates innovations like the C3K2 block, SPPF module, and

C2PSA block for increased accuracy and speed. The YOLO11 was compared with other YOLO models such

as YOLOv8,YOLOv9,YOLOv10 and the results showed that latest YOLO11 model performed best with its

accurate and faster results. The results highlight the effectiveness and efﬁciency of the YOLO11 in detecting

vehicles with 8 different classes with accurate results.

1 INTRODUCTION

Trafﬁc ﬂow detection is critical for efﬁcient urban

planning(Wang et al., 2018) and trafﬁc management,

helping authorities optimize the use of roads, reduce

congestion, and improve safety. Trafﬁc ﬂow detection

plays a crucial role in ensuring effective urban plan-

ning and trafﬁc management. It enables authorities to

optimize road usage, reduce congestion, and enhance

safety. In India, trafﬁc patterns vary signiﬁcantly

across regions due to differences in urban infrastruc-

ture, population density, and road usage. Metropoli-

tan cities like Delhi, Mumbai, and Bengaluru are in-

famous for their heavy trafﬁc congestion, especially

during peak hours. These cities experience high ve-

hicle density, leading to frequent trafﬁc jams, long

commutes, and increased air pollution. The road in-

frastructure in such cities often struggles to keep up,

with narrow roads, limited public transportation op-

tions, and a high volume of private vehicles exacer-

bating the problem. Conversely, smaller towns and

rural areas tend to have lighter trafﬁc but face unique

challenges, such as the prevalence of non-motorized

vehicles like bicycles, bullock carts, and pedestrians.

These factors add complexity to monitoring and man-

aging trafﬁc ﬂow in these regions.

Traditional trafﬁc monitoring methods, such as

physical sensors embedded in the roadways or man-

ual observation, often have limitations such as scala-

bility, cost, and real-time data collection. Traditional

machine learning methods often relied on handcrafted

features, which were less adaptive to diverse trafﬁc

conditions. Modern deep learning approaches, partic-

ularly Convolutional Neural Network (CNN)-based

architectures(Zhiqiang and Jun, 2017), have demon-

strated exceptional performance in object detection

tasks, including trafﬁc monitoring. You Only Look

Once (YOLO) a CNN is a popular family of real-

time object detection models known for its speed and

accuracy. YOLO11(Jocher and Qiu, 2024), an im-

proved version of YOLOv10(Wang et al., 2024) from

the YOLO family, it’s employed in this project due to

it’s remarkable speed and accuracy in detecting ob-

jects within an image. YOLO11, unlike conventional

object detection models, analyzes the entire image in

a single pass, which drastically reduces processing

time, making it highly suitable for real-time trafﬁc

monitoring. Its architecture is designed to detect and

classify various objects in trafﬁc scenes, such as vehi-

cles, pedestrians, and other relevant entities, into pre-

Maladkar, D., Sai Krishna, E. A., Nayak, S., Udasi, S. N. and Muttal, C.

Trafﬁc Flow Detection and Prediction Using YOLO11: A Threshold-Based Approach for Identifying Trafﬁc Levels.

DOI: 10.5220/0013592000004664

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 3rd International Conference on Futuristic Technology (INCOFT 2025) - Volume 2, pages 335-343

ISBN: 978-989-758-763-4

335

deﬁned categories like cars, buses, motorcycles, and

trucks. By evaluating the number of detected objects

within a frame, the system effectively assesses and

predicts trafﬁc levels. Based on predeﬁned thresh-

olds of object counts, trafﬁc can be classiﬁed as either

heavy or light, enabling trafﬁc management systems

to make real-time decisions.The YOLO11(Khanam

and Hussain, 2024) model needs to be trained on a

well-curated dataset which contains wide range of

trafﬁc scenarios. It is essential to capture images from

a wide range of environments to account for various

factors, including different weather conditions, light-

ing, and diverse perspectives. These factors can affect

how objects appear in images, so training the model

on varied data helps improve its robustness.

To increase dataset diversity and improve the

model’s adaptability to real-world scenarios, tech-

niques such as data augmentation—like ﬂipping, ro-

tating, and altering brightness—are crucial. These

techniques enable models to manage variations, such

as differences between day and night trafﬁc condi-

tions. Additionally, transfer learning, which involves

ﬁne-tuning a pre-trained model on a smaller, domain-

speciﬁc dataset, helps speed up training and enhances

accuracy. This method utilizes the generalized knowl-

edge of pre-trained models, tailoring it to address spe-

ciﬁc challenges, such as India’s heterogeneous trafﬁc

conditions. Customizing models for regional differ-

ences ensures precise vehicle detection and classiﬁ-

cation in intricate environments. By integrating these

strategies, systems like YOLO11 can deliver real-time

insights, facilitating smarter trafﬁc management, re-

ducing congestion, and improving road safety in ur-

ban settings(Chen, 2021)(Howard, 2018).

The paper is organized into ﬁve distinct sections,

each holding signiﬁcance in explaining our work. The

ﬁrst section focuses on the motivation behind select-

ing the problem statement and provides a detailed in-

troduction. The second section reviews the work pre-

viously carried out in this ﬁeld. The third section

covers the methodology, beginning with the data col-

lection phase, followed by an in-depth discussion on

the model architecture, model development, and con-

cluding with the evaluation and performance analysis.

This section also provides a detailed explanation of

the YOLO11 model architecture. The fourth section

showcases the results obtained from the model, while

the ﬁfth section concludes the paper by summarizing

the key features that contributed to the improved ac-

curacy of vehicle detection.

2 LITERATURE SURVEY

Intelligent Transportation Systems (ITS) have

adopted various methods to detect and predict trafﬁc

ﬂow, aiming to enhance trafﬁc management, reduce

congestion, and improve mobility (Qureshi and

Abdullah, 2013). One widely used method involves

edge computing combined with YOLOv4 for vehicle

detection and DeepSORT for tracking multiple

objects (Bin Zuraimi and Kamaru Zaman, 2021).

This approach processes video feeds locally at edge

nodes, minimizing delays and reducing dependence

on cloud-based systems. While it improves real-time

accuracy in vehicle detection and tracking, it faces

challenges such as occlusions, poor lighting con-

ditions, and environmental variability, particularly

in high-density trafﬁc scenarios. Additionally, this

method primarily focuses on detecting vehicles and

lacks the capability to effectively predict trafﬁc ﬂow,

limiting its utility for congestion forecasting.

A recent study combines deep learning models

like CNN-LSTM to predict trafﬁc ﬂow, utilizing cel-

lular automata-based simulations to generate train-

ing datasets. This approach addresses the issue of

insufﬁcient real-world data for model training. The

CNN captures spatial patterns, while the LSTM de-

tects temporal trends, making it effective for short-

term trafﬁc forecasting. However, this model is heav-

ily dependent on simulated data, which may not fully

reﬂect real-world conditions. Additionally, it does not

incorporate real-time vehicle detection, which is es-

sential for adaptive and dynamic trafﬁc management

(Yang and Jerath, 2024).

Studies have also investigated the integration of

YOLOv4 with DeepSORT for vehicle detection and

tracking in trafﬁc footage. This combination im-

proves the robustness of tracking, ensuring accurate

detection even when vehicles are partially obscured

or overlapping. Although it performs well for count-

ing and tracking vehicles, it lacks the functionality to

predict trafﬁc density or trends, limiting its effective-

ness for proactive trafﬁc management and congestion

forecasting (Ranjitha et al., 2023).

A study on one more machine learning algorithm

known as Single Shot MultiBox Detector (SSD) algo-

rithm(Su and Shu, 2024) provides an insights about

the trafﬁc ﬂow detection. The SSD algorithm has

gained popularity for its ability to perform object de-

tection in real-time. However, its traditional imple-

mentation struggles with feature extraction for small

or occluded vehicles, primarily due to the shallow

convolutional structure and lack of adaptability to di-

verse trafﬁc scenarios. Combining the improved SSD

with the DeepSort tracking algorithm allows for ro-

INCOFT 2025 - International Conference on Futuristic Technology

336

bust vehicle tracking and direction determination.

Our project uses YOLO11 with a custom thresh-

old mechanism to predict trafﬁc density as either

heavy or low. YOLO11’s advanced detection capa-

bilities allow it to classify vehicles more accurately,

even in challenging conditions. By setting a threshold

for trafﬁc density, the system can immediately catego-

rize trafﬁc ﬂow, offering real-time insights for trafﬁc

management. Additionally, deploying this system on

edge devices ensures it processes data quickly with

minimal delays, making it suitable for high-demand

scenarios.

This project not only improves on existing meth-

ods but also bridges the gap between vehicle detec-

tion and trafﬁc density prediction. By combining

advanced features of YOLO11 with threshold-based

decision-making, the system is practical and scalable

for real-world use. It provides a cost-effective and ef-

ﬁcient solution for monitoring trafﬁc, predicting con-

gestion, and enabling smarter trafﬁc management sys-

tems.

3 METHODOLOGY

This section provides a detailed overview of the mod-

els,methodologies, and their implementation strate-

gies employed for detecting the trafﬁc ﬂow with the

use of images.

3.1 Data Description

Data collection is a very basic and important step of

any project which is aimed at gathering accurate, rel-

evant, and comprehensive data to meet speciﬁc ob-

jectives of the project. Effective data collection in-

volves collection of data which is diverse, hence it

will help in training the model accurately. Our dataset

was taken from a website which consisted of a trafﬁc

images dataset. The dataset used in our study consists

of a diverse set of trafﬁc images, each containing vari-

ous vehicles such as cars, buses, and motorcycles with

corresponding labels. It consists of 5080 images and

each image consisting of a corresponding label. The

dataset was split into 3 parts namely train, valid and

test. The train data consists of 4476 images, the valid

data consists of 301 images and the test data consists

of 303 images. The data consists of various images

in various different conditions such as different light

conditions and also different angles. The images vary

from simple images consisting of just a single object

to images consisting of multiple different objects.

3.2 Model Architecture

This architecture lays the groundwork for YOLO11’s

design toward speed and accuracy in image process-

ing; it is based on preceding YOLO versions, like

YOLOv8(Sohan et al., 2024), YOLOv9(Wang et al.,

2025), and YOLOv10. The ideal YOLO11 innova-

tions are primarily from the C3K2 block, SPPF mod-

ule, and the C2PSA block. All these assist in pro-

cessing information with speed and accuracy by the

model. The Fig.1 shows the the YOLO11 architec-

ture.

3.2.1 Backbone

Convolutional Block: The convolutional block(Woo

et al., 2018) processes the input c, h, w through a 2D

convolutional layer, followed by a 2D Batch Normal-

ization layer, and ﬁnally with a SiLU activation func-

tion.

C3K2 block: YOLO11 features such blocks for

feature extraction at different backbone stages. Even

with the smallest of 3x3 kernels, this model is capa-

ble of capturing all the important elements from the

image, and yet this allows for more efﬁcient comput-

ing. The C3K2 block(Jegham et al., 2024), an ad-

vancement of the CSP (Cross Stage Partial) bottle-

neck ﬁrst presented in previous iterations, is the heart

of YOLO11. By dividing the feature map and apply-

ing a sequence of smaller kernel convolutions(3x3)

which are quicker and less expensive to compute than

larger kernel convolutions the C3K2 block optimizes

the information ﬂow across the network.The C3K2

block enhances feature representation using fewer pa-

rameters by processing smaller, distinct feature maps

and combining them following several convolutions.

3.2.2 Neck

SPPF: The Space Pyramid Pooling Fast module

(Zhang and Gu, 2023) is a new development in pool-

ing within various regions of an image at different po-

sitional projections within the SPPF. It aids the capac-

ity of that network to hold objects, usually in small

size, which has always scaled concern in previous

YOLOS when it comes to object as small as apple bee

or lesser. The SPPF will collect multi-scale, aggre-

gated context to pool features with various max pool-

ing at various kernel sizes, thereby augmenting ele-

ment capture across different resolution sources in or-

der to train a model capable the identiﬁcation of very

small subjects. The inclusion of SPPF also guaran-

tees YOLO11 to keep running at the same real-time

speed but with good multi-scale object detection ca-

pabilities.

Trafﬁc Flow Detection and Prediction Using YOLO11: A Threshold-Based Approach for Identifying Trafﬁc Levels

337

Figure 1: YOLO11 Architecture(Rao, 2023)

Attention Mechanism:

Position-Sensitive Attention: In order to improve

features extraction and processing using feed-forward

networks and position-sensitive attention (Zhu et al.,

2019), this class is capable of applying those with

the input tensors within themselves to process the at-

tention layer inputs at the input layer alone with the

outputs at the attention, concatenating both and send-

ing it through feed forward neural networks. This

then proceeds in completing the path through Conv

Block, Conv Block without activation, and ﬁnally

Conv Block output concatenated with that of the ﬁrst

contact layer.

C2PSA: Inspired by the C2F (Watanabe, 1980)

block structure, the C2PSA block uses two PSA mod-

ules that operate on different branches of the feature

map concatenated together later on. This ensures that

the required balance is met between detection accu-

racy and cost while ensuring that the model has a con-

centration on spatial information. By manipulating

the features with spatial attention, the C2PSA block

puts the model into a more advantageous position to-

ward effectively focusing on those regions of interest.

This allows YOLO11 to perform better than its older

iterations like YOLOv8 when an object is required to

be detected with greater precision and detail.

3.2.3 Head

Like previous versions of YOLO, YOLO11 has a

multi-scale prediction head which is used to detect all

objects with varying sizes. The head will then take

features produced by the neck and backbone so that

detection boxes will be output at three different scales

that are low, medium, and high.

Typically, the detection head using three feature

maps, P3, P4, and P5, which are made by the dif-

ferent image granularities. This approach guarantees

that very large objects are captured using higher-level

features (P5), while smaller objects can be detected in

greater detail (P3).

3.3 Model Building

Model building includes training the model on the

dataset that we collected and split into training (4476

images) and testing (303 images) sets. The train-

ing set is used to train the YOLO11 model on vari-

ous kinds of images. The model will learn to detect

the vehicles accurately after the training process. A

threshold of 5 vehicles is set, if the number of vehi-

cles is more than the threshold, then it is classiﬁed

as heavy trafﬁc, and if it is less than the threshold,

then it is considered light trafﬁc, this is done using

the TrafﬁcPred algorithm as shown in Algorithm 1.

INCOFT 2025 - International Conference on Futuristic Technology

338

Figure 2: Proposed Pipeline Architecture

After the training process, the model can be tested on

unseen images to get the results based on the train-

ing achieved. The Fig.2 shows the Proposed Pipeline

Architecture for classifying a trafﬁc level using the

TrafﬁcPred Algorithm. The process begins with an

input image that is passed to the pre-trained object

recognition model, i.e., YOLO11 model. The image

is processed so that the model can produce an im-

age with annotations showing the objects that have

been detected surrounded by bounding boxes. This

image that has been annotated can then be fed to the

TrafﬁcPred Algorithm, which basically works out the

number of bounding boxes detected in all the images

to identify the state of trafﬁc.

Data: (image path) and (bbox threshold

= 5).

Result: "heavy" or "light"

Step 1:

evaluate traffic level(image path);

results ←

model.predict(source=image path,

show=True, save=True, conf=0.4) ;

bbox count ← len(results[0].boxes);

traffic level

← label traffic level (bbox count);

Step 2:

label traffic level(bbox count);

if bbox count > 5 then

return "heavy";

end

else

return "light";

end

return traffic level;

Algorithm 1: TrafﬁcPred algorithm: Classiﬁes trafﬁc as

”heavy” or ”light” based on bounding box count.

Regarding the TrafﬁcPred Algorithm as shown in

Algorithm 1, the crucial point of the code is to de-

termine whether the trafﬁc is light or heavy based

upon the number of objects found in the image by

means of a pre-trained object detection model. Just

as a helper function label trafﬁc level(bbox count)

is deﬁned, the trafﬁc level can be obtained by

comparing the number of detected bounding boxes

(bbox count) with a threshold over 5. If it ex-

ceeds 5, the trafﬁc is classiﬁed as heavy other-

wise, it is classiﬁed as light. The major function,

evaluate trafﬁc level(image path), evaluates the traf-

ﬁc conditions by loading the input image and process-

ing the same on the object detection model to achieve

predictions. Hence, the results that include bounding

boxes among the factors will be analyzed to count the

detected objects, and the trafﬁc level is determined af-

ter employing the helper function.

The training process uses several hyperparame-

ters, which have a direct inﬂuence on the speed, accu-

racy and convergence of the model. We set the num-

ber of epochs to 100 to help the model converge prop-

erly and get a high accuracy. The batch size was set

to 64 images, momentum was 0.9 and the weight de-

cay factor was 0.0005. All the hyperparameters are

summarized below in the Table 1.

Table 1: Hyperparameters for Model Training

Hyperparameters Values

Epochs 100

Batch Size 64

Image Size 640

Learning Rate 0.000833

Momentum 0.9

Weight Decay 0.0005

Trafﬁc Flow Detection and Prediction Using YOLO11: A Threshold-Based Approach for Identifying Trafﬁc Levels

339

3.4 Model Evaluation

Evaluating models must be done to validate the effec-

tiveness of the models. The following metrics were

considered in model evaluation-as indicated-below to

analyze speed, accuracy, and robustness: vehicle de-

tection by classes.

3.4.1 Mean Average Precision(mAP)

Mean Average Precision(mAP): The value of mean

average precision is derived from a mean of average

precisions over the different classes at multiple IoUs

from 0 to 1. Combined, this is a metric to evaluate all

object detection models.

mAP =

∑

t=1

Average Precision(AP) (1)

Where T is the total number of IoU thresholds and

AP is the value calculated at a speciﬁc IoU threshold

3.4.2 Precision

Precision (Goutte and Gaussier, 2005) is numerically

deﬁned as the quotient between the numbers of cor-

rectly detected objects and the total numbers of ob-

jects predicted. The higher the precision rate is, the

fewer objects it produces that are not itself correctly

detected-i.e. fewer false positives.

Precision =

TP + FP

(2)

Where TP is True Positives which corresponds to

objects that are detected properly and FP is False Pos-

itives which corresponds to objects that are not de-

tected properly.

3.4.3 Recall

In this case, Recall (Yntema and Trask, 1963) is the

ratio of those detected objects over the total number

of objects contained by the dataset. The greater the

recall, the fewer objects are left overload.

Recall =

TP + FN

(3)

Where TP is True Positives which corresponds

to objects that are detected properly and FN is

False Negatives which corresponds to objects that are

missed out.

3.4.4 F1 Score

The F1 score is a measure of precision and the recall

combined together in the harmonic mean. It is appro-

priate for situations where both precision and recall

must be factored into evaluation of the data, because

it produces one value reﬂective of those two factors.

F1 = 2 ·

Precision · Recall

Precision + Recall

(4)

3.5 Performance

3.5.1 Precision-Conﬁdence Curve

The precision-conﬁdence curve is a graph which in-

dicates the performance for precision with respect to

conﬁdence thresholds. The precision curve shows

how many of the positive predictions predicted to be

true actually are true. For most classes, as conﬁdence

increases, precision increases; that is, a higher thresh-

old would yield fewer false positives at the threshold.

Figure 3: Precision-Conﬁdence Curve which shows the

graphical relation between precision and conﬁdence.

The graph highlights that the model has high pre-

cision on certain classes such as car and bus achieving

high precision as compared to others classes such as

bicycle.

3.5.2 Recall-Conﬁdence Curve

A recall-conﬁdence curve is deﬁned as a graphical

representation of recall and conﬁdence. It is useful

in maintaining the trend made in increasing the con-

ﬁdence level in terms of effect on the recall of the

model. It indicates low recall at high conﬁdence lev-

els, whereas it indicates contrary conditions at low

conﬁdence levels.

The graph highlights that the classes like car and

bus have higher recall values at higher conﬁdence lev-

els compared to bicycle, which dips more quickly.

INCOFT 2025 - International Conference on Futuristic Technology

340

Figure 4: Recall-Conﬁdence Curve which shows the graph-

ical relation between recall and conﬁdence.

3.5.3 Precision-Recall Curve

Precision-recall curve graphically represents the re-

lation between precision and recall as a function of

different thresholds. A higher precision standing in

front of a lower recall means that less number of

false positives but still more number of false nega-

tives are counted against the model. On the other

hand, increased recall with lower precision will be

stated as there is a lesser number of false negatives

counted with a huge number of false positives (Davis

and Goadrich, 2006).

Figure 5: A precision recall curve is that it forms a graph

with the entire relationship between precision and recall

The graph highlights that the model performs well

on certain classes such as car and cng achieving high

AP scores while some others classes such as bicycle

have low AP.

3.5.4 Confusion Matrix

Confusion matrix is a tool for the evaluation of the

performance of a model, which gives detailed predic-

tion data of the model such as how well the model

performs in differentiating different classes (Salmon

et al., 2015). Rows are the predicted classes as

the columns are the original classes (true labels).ted

classes, and the columns correspond to the actual

classes (true labels).

Figure 6: Confusion matrix shows how well the classes are

predicted.

The diagonal cells represent correctly predicted

cases, where the predicted class and the ac-

tual class match. Off-diagonal cells represent

misclassiﬁcations-or cases where the predicted class

does not match the actual class. This confusion matrix

holds a precious insight of performance of the model

for each class, marking important strong points where

cars and persons are predicted with higher accuracy

and relatively weak areas measured with lower accu-

racy as for bicycles and rickshaws.

4 RESULTS

The study conducted an detailed comparison of

YOLOv8, YOLOv9, YOLOv10, and YOLO11 to

evaluate their performance in detecting and predicting

trafﬁc ﬂow across eight vehicle classes: Car, CNG,

Motorcycle, Bus, Rickshaw, Bicycle, Other-Vehicles,

and Person. YOLO11 demonstrated signiﬁcant ad-

vancements over its predecessors in key metrics, in-

cluding precision, recall, (mAP) at an IoU threshold

of 0.5 (mAP@0.5), speed, and robustness. It achieved

the highest precision of 0.88, indicating fewer false

positives, and the highest recall of 0.85, reﬂecting

its ability to detect a greater proportion of objects

in the dataset. These improvements are crucial for

real-world trafﬁc management systems, which require

high accuracy and reliability under diverse condi-

tions. YOLO11 also achieved a superior mAP@0.5 of

0.773, signiﬁcantly outperforming YOLOv8 (0.632),

YOLOv9 (0.650), and YOLOv10 (0.714). This met-

ric highlights YOLO11’s ability to balance precision

and recall, ensuring consistent and dependable per-

formance across all vehicle classes. Its detection ca-

pabilities for smaller or less frequent objects like Bi-

cycles and Rickshaws were particularly notable, ad-

Trafﬁc Flow Detection and Prediction Using YOLO11: A Threshold-Based Approach for Identifying Trafﬁc Levels

341

dressing limitations seen in earlier versions. Visu-

alization tools like precision-recall curves and con-

fusion matrices further validated YOLO11’s perfor-

mance, conﬁrming its reliability for practical deploy-

ments.

Table 2: Performance Comparison of YOLO Models for

Trafﬁc Detection and Prediction

Model Precision Recall mAP@0.5

YOLOv8 0.78 0.75 0.632

YOLOv9 0.80 0.77 0.650

YOLOv10 0.84 0.81 0.714

YOLO11 0.88 0.85 0.773

The performance of YOLO11 model on the trafﬁc

dataset was evaluated to detect and classify objects

such as cars, motorcycles, and CNG vehicles. Fig.7

shows a sequence of two images in which the ﬁrst im-

age is an example input image from the dataset, and

the second image shows the model’s output with de-

tected objects bounded by boxes. Each bounding box

is labeled with the predicted class and the correspond-

ing conﬁdence score. The model successfully iden-

tiﬁed and differentiated between various trafﬁc enti-

ties with high conﬁdence, demonstrating its ability to

process urban trafﬁc scenes effectively better then the

previous models.

Figure 7: The ﬁrst image shows a image from our dataset

and the image following it shows the different vehicles de-

tected and also bounded.

5 CONCLUSION

This research developed a trafﬁc ﬂow detection and

prediction system using YOLO11, achieving high ac-

curacy and efﬁciency for real-time trafﬁc monitoring.

The system uses a threshold-based method to classify

trafﬁc as heavy or light. YOLO11’s advanced de-

sign, including features like the C3K2 block, SPPF

module, and attention mechanisms, played a key role

in improving YOLO11’s accuracy and adaptability in

complex scenarios, such as overlapping vehicles and

varying lighting conditions. These features helped

the system handle challenges such as crowded trafﬁc,

poor lighting, and different environmental conditions

effectively. The model was thoroughly evaluated on

a dataset containing eight vehicle classes: Car, CNG,

Motorcycle, Bus, Rickshaw, Bicycle, Other-Vehicles,

and Person. YOLO11 achieved a precision of 0.88,

recall of 0.85, and an mAP@0.5 of 0.773, outper-

forming earlier YOLO versions such as YOLOv8,

YOLOv9, and YOLOv10. The system demonstrated

high precision and recall for frequently encountered

classes such as cars and buses, while effectively de-

tecting smaller or less frequent objects like bicycles.

Visualization tools, such as precision-recall curves

and confusion matrices, highlighted the model’s ro-

bust performance and ability to generalize across dif-

ferent scenarios.

This system also reduces the need for expensive

infrastructure, as it can be deployed on existing edge

devices such as trafﬁc cameras. The ability to pro-

cess data locally allows for quicker decision-making

and reduces reliability on cloud services. By offering

accurate and immediate trafﬁc predictions, this model

can improve trafﬁc ﬂow, reduce congestion, and en-

hance safety in urban areas. By combining vehicle

detection with trafﬁc density prediction, this research

shows how powerful advanced models like YOLO11

can be for improving transportation systems. The re-

sults highlight its potential as a key technology for

creating smarter and more efﬁcient smart cities.

6 FUTURE WORK

The use of YOLO11 in trafﬁc ﬂow detection and fore-

casting, together with potential future applications to-

wards more generalized ways of trafﬁc management

for intelligent transportation systems, comprise the

majority of this work. Future advancements could in-

clude real-time analysis of continuous video streams,

the integration of other information sources like GPS

and weather, and more precise classiﬁcation of trafﬁc

types like moderate or severe congestion. A global

trafﬁc database and accident detection will also be

useful additions to the model, which will optimize

performance on low-power edge devices. Lastly, the

development of an intuitive interface for real-time

data display would allow for more widespread practi-

cal use.

REFERENCES

Bin Zuraimi, M. A. and Kamaru Zaman, F. H. (2021). Vehi-

cle detection and tracking using yolo and deepsort. In

INCOFT 2025 - International Conference on Futuristic Technology

342

2021 IEEE 11th IEEE Symposium on Computer Ap-

plications & Industrial Electronics (ISCAIE), pages

23–29.

Chen, A. (2021). Data augmentation techniques. Journal of

Machine Learning, 12:123–130.

Davis, J. and Goadrich, M. (2006). The relationship be-

tween precision-recall and roc curves. In Proceed-

ings of the 23rd international conference on Machine

learning, pages 233–240.

Goutte, C. and Gaussier, E. (2005). A probabilistic interpre-

tation of precision, recall and f-score, with implication

for evaluation. In European conference on informa-

tion retrieval, pages 345–359. Springer.

Howard, A. (2018). Transfer learning for deep networks.

Deep Learning Journal, 9:89–95.

Jegham, N., Koh, C. Y., Abdelatti, M., and Hendawi,

A. (2024). Evaluating the evolution of yolo (you

only look once) models: A comprehensive benchmark

study of yolo11 and its predecessors.

Jocher, G. and Qiu, J. (2024). Ultralytics yolo11.

Khanam, R. and Hussain, M. (2024). Yolov11: An

overview of the key architectural enhancements.

Qureshi, K. N. and Abdullah, A. H. (2013). A survey on in-

telligent transportation systems. Middle-East Journal

of Scientiﬁc Research, 15(5):629–642.

Ranjitha, R., Ahuja, P., Shreeshayana, R., and Anil, D.

(2023). Edge intelligence for trafﬁc ﬂow detection: A

deep learning approach. In 2023 International Con-

ference on Quantum Technologies, Communications,

Computing, Hardware and Embedded Systems Secu-

rity (iQ-CCHESS), pages 1–6. IEEE.

Rao, N. (2023). Yolov11 explained: Next-level object de-

tection with enhanced speed and accuracy.

Salmon, B. P., Kleynhans, W., Schwegmann, C. P., and

Olivier, J. C. (2015). Proper comparison among meth-

ods using a confusion matrix. In 2015 IEEE Inter-

national geoscience and remote sensing symposium

(IGARSS), pages 3057–3060. IEEE.

Sohan, M., Sai Ram, T., Reddy, R., and Venkata, C. (2024).

A review on yolov8 and its advancements. In Interna-

tional Conference on Data Intelligence and Cognitive

Informatics, pages 529–545. Springer.

Su, G. and Shu, H. (2024). Trafﬁc ﬂow detection method

based on improved ssd algorithm for intelligent trans-

portation system. PLOS ONE, 19(3):e0300214.

Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J.,

and Ding, G. (2024). Yolov10: Real-time end-to-end

object detection. arXiv preprint arXiv:2405.14458.

Wang, C.-Y., Yeh, I.-H., and Mark Liao, H.-Y. (2025).

Yolov9: Learning what you want to learn using pro-

grammable gradient information. In European Con-

ference on Computer Vision, pages 1–21. Springer.

Wang, S., Yu, D., Ma, X., and Xing, X. (2018). Analyzing

urban trafﬁc demand distribution and the correlation

between trafﬁc ﬂow and the built environment based

on detector data and pois. European Transport Re-

search Review, 10:1–17.

Watanabe, N. (1980). Two types of graphite ﬂuorides,(cf)

n and (c2f) n, and discharge characteristics and mech-

anisms of electrodes of (cf) n and (c2f) n in lithium

batteries. Solid State Ionics, 1(1-2):87–110.

Woo, S., Park, J., Lee, J.-Y., and Kweon, I. S. (2018). Cbam:

Convolutional block attention module. In Proceed-

ings of the European conference on computer vision

(ECCV), pages 3–19.

Yang, Z. and Jerath, K. (2024). Energy-guided data sam-

pling for trafﬁc prediction with mini training datasets.

Yntema, D. B. and Trask, F. P. (1963). Recall as a search

process. Journal of Verbal Learning and Verbal Be-

havior, 2(1):65–74.

Zhang, C. and Gu, S. (2023). Fish object detection

based on spatial bias pyramid pooling: Improved bias-

yolo. In 2023 8th International Conference on Intelli-

gent Computing and Signal Processing (ICSP), pages

1976–1979. IEEE.

Zhiqiang, W. and Jun, L. (2017). A review of object de-

tection based on convolutional neural network. In

2017 36th Chinese control conference (CCC), pages

11104–11109. IEEE.

Zhu, X., Cheng, D., Zhang, Z., Lin, S., and Dai, J. (2019).

An empirical study of spatial attention mechanisms in

deep networks. In Proceedings of the IEEE/CVF inter-

national conference on computer vision, pages 6688–

6697.

Trafﬁc Flow Detection and Prediction Using YOLO11: A Threshold-Based Approach for Identifying Trafﬁc Levels

343