Machine Learning for Predicting Trafﬁc and Determining Road Capacity

Alex Lewis

, Rina Azoulay

and Esther David

Department of Data Mining, Jerusalem College of Technology, Jerusalem, Israel

Department of Computer Science, Jerusalem College of Technology, Jerusalem, Israel

Department of Computers, Achva Academic College, Beer Tuvia, Israel

Keywords:

Trafﬁc Forecasting, Capacity Estimation, Machine Learning.

Abstract:

This study proposes the use of machine learning techniques to predict trafﬁc speed based on trafﬁc ﬂow and

other road-related features, utilizing the California Freeway PeMS trafﬁc dataset. Extensive research has been

dedicated to the prediction of road speed; however, the primary challenge lies in accurately forecasting speed

as a function of trafﬁc ﬂow. The learning methods compared include linear regression, K-nearest neighbors

(KNN), decision trees, neural networks, and ensemble methods. The primary objective of this research is to

develop a model capable of estimating road capacity, a crucial factor in designing an auction system for road

usage. The ﬁndings reveal that the performance of each algorithm varies with the selection of features and the

volume of data available. The results demonstrate that ensemble methods and KNN surpass other models in

accuracy and consistency for predicting trafﬁc speed. These models are then employed to create a ﬂow-speed

graph, which aids in determining road capacity.

1 INTRODUCTION

Transportation plays a vital role in modern society,

facilitating travel, the movement of goods, and the

connection of people and places. Intelligent Trans-

port Systems (ITS) aim to enhance transportation by

bringing together various elements such as vehicles,

drivers, passengers, road operators, and managers in a

way that interacts with the environment. The primary

goals of ITS are to decrease the number of road fatal-

ities and injuries, improve the efﬁciency of vehicles

and trafﬁc networks, and reduce pollution(Williams,

2008).

An important aspect of trafﬁc assignment is esti-

mating road capacity, deﬁned as the maximum num-

ber of vehicles passing through a speciﬁc point within

a time period, considering existing road, trafﬁc, and

control conditions (Reilly, 1997).

Traditionally, transportation capacity assessment

has relied on static statistical models from a civil engi-

neering perspective, which depend on factors such as

the number of lanes, road width, the free ﬂow speed,

etc. Conversely, machine learning and deep learn-

ing models can assess road capacity based on real-

time data, providing trafﬁc management systems with

a more precise understanding of network conditions,

enhancing trafﬁc control, and reducing overall travel

time.

This study aims to construct machine learning

and deep learning models that estimate road capacity

based on current speed predictions. Our regression

models account for factors that impact speed, such as

trafﬁc ﬂow and other relevant features. Rather than

projecting speciﬁc capacity thresholds, we seek to ex-

plore the interplay between ﬂow, speed, and other

variables, empowering road planners to determine op-

timal capacity for their desired speed. In the future

these models will be combined with different opti-

mization tools such as reinforcement learning and dif-

ferent search methods in order to control the trafﬁc

optimally.

The motivation for our research is to develop a

comprehensive road network model that can capture

the dynamics and capacity of individual roads. This

model will serve as a foundation for future research,

speciﬁcally focusing on the creation of an auction sys-

tem for road entry. By leveraging the speed prediction

models developed in this study and integrating them

with the aforementioned optimization tools, the auc-

tion system will effectively assess the capacity of each

road and facilitate the allocation of slots through var-

ious auction mechanisms. This integrated approach

will contribute to improved road management and re-

source allocation in transportation systems.

1162

Lewis, A., Azoulay, R. and David, E.

Machine Learning for Predicting Trafﬁc and Determining Road Capacity.

DOI: 10.5220/0012451900003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 3, pages 1162-1169

ISBN: 978-989-758-680-4; ISSN: 2184-433X

The paper begins with a literature review dis-

cussing the use of machine learning in trafﬁc pre-

diction and traditional capacity estimation methods.

The methods chapter explains various approaches for

speed prediction, followed by a results chapter dis-

cussing the obtained outcomes. Utilizing the mod-

els described, a capacity estimation model is devel-

oped and explained in the subsequent chapter. The

paper concludes by highlighting potential future re-

search directions in the ﬁeld.

2 LITERATURE REVIEW

Road capacity refers to the number of vehicles that

can pass a given point during a speciﬁc period un-

der prevailing roadway, trafﬁc and control conditions

(Reilly, 1997) There are many factors that affect road

capacity such as external conditions including the

weather, visibility and road lighting. (Chung et al.,

2006) Furthermore, road density (number of cars on

the road) affects the capacity. Namely, if a road is

congested, the number of cars that can enter the road

without creating further congestion is much lower

than the general capacity. (Chen, 2002)

The HCM (Highway Capacity Manual) (Reilly,

1997), a TRB (Transportation Research Board) pub-

lication in the US, provides guidelines and compu-

tational procedures for assessing highway facilities’

capacity and quality of service. The free ﬂow speed

(FFS) is the theoretical speed of trafﬁc when no ve-

hicles are present, and can be calculated using HCM

formulas that account for features such as lane width

and ramp density (Reilly, 1997). The FFS is inﬂu-

enced by both intrinsic and environmental factors,

such as road width and weather conditions (Kyte

et al., 2000). Since the HCM propose a constant

equation for calculating road capacity based on free

ﬂow speed, it fails to incorporate external and spatio-

temporal factors.

In our research we used the California Freeway

Performance Management System, a system that col-

lects real-time data from all the highways in Califor-

nia. The data is collected by over 40,000 sensors

spanning over 20,000 road segments. This database

includes data such as the average speed, ﬂow rate and

occupancy for every 30 seconds on all the highways,

and has calculated values such as the average delay on

each of the roads (Chen, 2002). Many of the recent

works conducted on trafﬁc prediction have been done

with data obtained from the PeMS (Tedjopurnomo

et al., 2020).

Traditional time series techniques have been used

for trafﬁc speed and ﬂow prediction. This include the

ARIMA (Billings and Yang, 2006) (Lee and Fambro,

1999) and it’s variants (SARIMA, ARIMAX, etc.)

(Ghosh et al., 2007) and Kalman ﬁltering(Kumar,

2017). Moreover, Machine Learning and Deep Learn-

ing have been used in many different transportation

problems studies(Haghighat et al., 2020). Regression

techniques such as linear regression and logistic re-

gression were used for trafﬁc ﬂow prediction(Lonare

and Ravi, 2020). Multi-variate regression has also

been used for congestion prediction (Liu et al., 2020).

Other models such as SVM have been used for trafﬁc

volume prediction (Mingheng et al., 2013) and deci-

sion trees have also been used for congestion predic-

tion (Tamir et al., 2020).

With the recent developments in deep learning,

there has been a lot of trafﬁc ﬂow and trafﬁc speed

prediction done using deep learning (Tamir et al.,

2020). On top of the regular data used as input of

a neural network, there are speciﬁc models that uti-

lize the spatial and temporal dimensions of the data.

Spatial models use deep learning frameworks such

as CNN(Song et al., 2017) that use the geographi-

cal data in order to build the model. Temporal mod-

els employ deep learning frameworks such as RNN

and LSTM(Tian et al., 2018) that make use of his-

toric data in order to improve the prediction. There

are hybrid models that combine both spatial and tem-

poral data such as the CNN-LSTM hybrid model(Cao

et al., 2020).

Graph Neural Networks (GNNs) are deep learn-

ing methods designed to perform inference on data

described by graphs. GNNs are neural networks that

can be directly applied to graphs, and provide an easy

way to implement node-level, edge-level, and graph-

level prediction tasks. GNNs do not only have the

ability to learn using each of the segments features,

but can also catch the connections between the differ-

ent road segments using spatial data.

Extensive research has focused on predicting road

speed, however the difference between most research

and our research is that we are attempting to accu-

rately forecast speed as a function of trafﬁc ﬂow, and

utilizing this model for simulated data that we create

for the capacity model.

The deﬁnition of road capacity is the maximum

number of vehicles or amount of trafﬁc ﬂow that a

roadway can accommodate effectively and efﬁciently

within a given time period. In the review of the re-

search conducted in this ﬁeld, we found only one

study (Huo et al., 2022) on capacity estimation using

deep learning. The error function for the network was

calculated by using the travel time as a function of

the road capacity and the error value is the difference

between the predicted travel time and the real travel

Machine Learning for Predicting Trafﬁc and Determining Road Capacity

1163

time. The present study is different from the work

conducted by Huo et al. (Huo et al., 2022) in that a

precise quantiﬁcation of the road capacity is not pro-

vided. Instead, we seek to understand the trafﬁc ﬂow

in relation to velocity and leverages the advantages

of spatio-temporal data, rendering it more dynamic in

nature, which will allow more ﬂexibility for the auc-

tion system.

3 MACHINE LEARNING

METHODS FOR SPEED

PREDICTION

The goal of the model is to accurately predict the

speed of the leftmost lane for each road segment as a

function of the different input features chosen. To ac-

complish this, we employed various machine learning

algorithms, namely decision trees, random forest, k-

Nearest Neighbors (k-NN) regressor, neural networks

, and linear regression. By applying these algorithms,

we aimed to capture the underlying patterns and rela-

tionships within the data and leverage them for accu-

rate speed prediction.

Our research employs two distinct methodologies

for the analysis. The ﬁrst methodology involves con-

structing machine learning models that utilize data

from all segments collectively. In contrast, the second

methodology focuses on developing individual ma-

chine learning models for each segment. By exploring

these two approaches, we aim to investigate the com-

parative effectiveness and performance of both cen-

tralized and decentralized modeling strategies in cap-

turing the complexities of the data.

In the ﬁrst methodology, we gathered trafﬁc data

from multiple segments and used data from all the

segments to help with the prediction (the segment id

was not part of the input for the model). This holistic

approach allowed us to capture the collective trafﬁc

dynamics and identify broader trends across the road

network. Utilizing the aforementioned algorithms, we

trained models on this dataset to predict the speed on

the roads.

In the second methodology, we focused on the in-

dividual segment level, where we isolated the data

from each segment to predict the speed speciﬁc to

that segment. This approach provided insights into

the localized factors inﬂuencing road speed, allowing

for more targeted predictions. Similar to the previous

methodology, we employed the decision trees, ran-

dom forest, k-NN regressor, SVR, and linear regres-

sion algorithms to build segment-speciﬁc models.

To capture the temporal aspect, we introduced

lag features that integrate time-series information.

Speciﬁcally, these lag features were constructed by

considering the ﬂow and speed values from the pre-

vious three time slots, encompassing both the entire

road segment and speciﬁcally the ﬁrst lane. Further-

more, we incorporated the hour of travel as an ad-

ditional feature, which was represented as an integer

between 0 and 23.

In order to investigate the inﬂuence of different in-

put features on our predictive model, we conducted a

series of experiments utilizing various combinations

of inputs. Our objective was to systematically vary

the input variables and evaluate whether speciﬁc at-

tributes resulted in improved outcomes, meaning bet-

ter MSE and R2 and also models that generalize well

to unseen data. We employed three distinct sets of

input variables for this purpose.

The ﬁrst set focused on road-speciﬁc information,

encompassing factors such as the number of lanes,

road type obtained from the Pems database, and ﬂow

data. The ﬂow data consisted of the current ﬂow, as

well as lag features representing ﬂow from the previ-

ous three time slots.

For the second set, we expanded the feature space

by including the lag features for road speed from the

previous three time slots. This addition allowed us to

capture temporal dependencies in the speed data.

In the third set, we introduced the free ﬂow speed

to the ﬁrst input set. This augmentation enhanced the

speed-related information without the need to incor-

porate the most recent history of speed. There was no

reason to use this input set for the second methodol-

ogy since the free ﬂow speed would be equal for all

of the data because the learning is on each segment

individually.

We had a speciﬁc motivation for conducting an ex-

periment using an input set solely comprised of ﬂow

lag features, excluding the speed features. The ra-

tionale behind this decision was to evaluate whether

our model could achieve satisfactory performance by

solely relying on ﬂow values. By doing so, we aimed

to develop a capacity model that could accurately pre-

dict optimal capacity without relying on knowledge

of recent speed variations on the road. This approach

would allow us to create a model based on the ﬂow

data and other constant features, providing us with the

capability to estimate optimal capacity independently

of recent speed information. Utilizing speed methods

can be useful for a live system with real data, and the

values for the MSE and R2 can also set a benchmark

for the accuracy that can be acheived.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1164

3.1 Learning Methods

Linear Regression. A statistical method for modeling

the relationship between a dependent variable and one

or more independent variables. It assumes a linear

relationship between the variables and aims to ﬁnd the

best-ﬁt line that represents the data.

K-nearest Neighbors (KNN). A non-parametric

machine learning algorithm used for classiﬁcation or

regression. It works by ﬁnding the K nearest data

points to a given query point and using their la-

bels/values to make a prediction.

Neural Network. A type of machine learning al-

gorithm modeled after the structure and function of

the human brain. It consists of layers of intercon-

nected nodes (neurons) that process information and

learn from data through a process called backpropa-

gation.

Decision Trees. A machine learning algorithm

that uses a tree-like structure to model decisions and

their possible consequences. Each node in the tree

represents a decision based on a feature, and each

branch represents the possible outcomes based on the

decision.

Random Forest. A type of ensemble machine

learning algorithm that combines multiple decision

trees to improve prediction accuracy and reduce over-

ﬁtting. It works by building multiple decision trees

on random subsets of the data and aggregating their

predictions.

3.2 The Data

The research was carried out using data from the

California Freeway PeMS (Performance Management

System), a system that collects real-time data from

all the highways in California. Over 40,000 sensors

spanning over 20,000 road segments collect the data

which include the average speed, the ﬂow rate and the

occupancy for every 30 seconds on each of the road

segments, as well as calculated values such as aver-

age delays (Chen, 2002). The data used in our basic

model was obtained from District 3 in California and

spanned the whole of the month of September 2022,

and the segments in the district can be seen in Figure

1. Our dataset comprises 12 million observations, and

we used a 75:25 train-test split after eliminating some

of the data as part of the pre-processing stage.

4 RESULTS

In this section, we want to discuss the results of the

different models built and how we can use them for

Figure 1: PeMS District 3 segments.

capacity models. We measured the results of the mod-

els using the value of the mean squared error (MSE)

and the R2 score.

The ﬁrst three tables are the tables that utilized the

ﬁrst methodology, where the models learn from all the

segment’s data together. The results for the ﬁrst input

set, the input set that includes the speed lag features

can be seen in table 1, the results for the second input

set - the input set with ﬂow features and without any

speed related features can be seen in table 2 and the

results for the third set, the set that includes the ffs as

the speed related feature can be seen in table 3.

Table 1: First methodology, ﬁrst input set (speed lag fea-

tures).

Regression Scores - ﬁrst input set

Model MSE R2

Decision Tree 3.59 0.937

Random Forest 2.98 0.948

Nearest Neighbors 3.43 0.940

Neural Network 3.20 0.944

Table 2: First methodology, second input set (no speed re-

lated features).

Regression Scores - second input set

Model MSE R2

Decision Trees 39.38 0.315

Random Forest 36.71 0.362

Nearest Neighbors 35.19 0.352

Neural Network 42.64 0.275

Table 3: First methodology, third input set.

Regression Scores - third input set (ffs feature)

Model MSE R2

Decision Tree 24.98 0.56

Random Forest 19.94 0.65

Nearest Neighbors 20.47 0.64

Neural Network 27.54 0.52

Machine Learning for Predicting Trafﬁc and Determining Road Capacity

1165

The results for the second methodology for speed

prediction, predicting the speed using only data from

each individual segment, are presented in the follow-

ing tables. Table 4 shows the results for the ﬁrst input

set, the input set with speed related features and table

5 shows the results for the second input set, the input

set without any speed related features.

Table 4: Second methodology, ﬁrst input set.

Regression Scores -speed lag features

Model Mean

MSE

Median

MSE

Mean

Median

Linear Regres-

sion

3.71 2.31 0.71 0.80

Decision Trees 4.17 2.50 0.61 0.79

Random Forest 3.35 2.00 0.70 0.83

Nearest Neigh-

bors

3.66 2.23 0.75 0.83

Neural Net-

work

4.92 3.04 0.6 0.78

Table 5: Second methodology, second input set.

Regression Scores - no speed related features

Model Mean

MSE

Median

MSE

Mean

Median

Linear Regres-

sion

21.81 10.4 0.28 0.26

Decision Trees 16.23 7.98 0.34 0.38

Random Forest 13.19 6.52 0.46 0.48

Nearest Neigh-

bors

13.29 6.76 0.45 0.47

Neural Net-

work

4.92 3.04 0.6 0.78

The ﬁndings obtained indicate that in the context

of individual segment-based predictions, basic ma-

chine learning techniques like linear regression ex-

hibit comparable performance to more sophisticated

algorithms such as random forest and neural network.

This observation suggests that the data pertaining to

each individual segment is not characterized by sub-

stantial complexity.

We experimented with different values for the

amount of neighbors from which to learn in the KNN

algorithm, here too there wasn’t much difference be-

tween the results for different parameters. We ex-

perimented with different values for the depth of the

tree, where the optimal value changed for each of the

different feature input sets. Additionally, we experi-

mented both with the maximum depth and number of

estimators for the random forest algorithm. The MSE

value for the different depths were pretty similar and

varied for each of the different input sets.

Upon analyzing the outcomes of the second in-

put set, the input set with no speed related features,

it becomes evident that relying solely on the ﬂow lag

features does not sufﬁce to generate a robust model.

This limitation could be attributed to the insufﬁcient

availability of comprehensive data pertaining to each

road’s characteristics, necessitating the inclusion of

speed lag features. The models may lack adequate

temporal context when considering only the three

most recent ﬂow values. Incorporating additional

time series features and utilizing specialized tempo-

ral models such as CNN and RNN could potentially

enhance the model’s performance by capturing more

intricate temporal patterns.

Notably, the results obtained from the third input

set demonstrate a substantial improvement in perfor-

mance upon incorporating speed data, even though it

does not encompass the most recent historical speed

information. This ﬁnding underscores the signiﬁ-

cance of incorporating speed-related information, as

it yields signiﬁcant enhancements in model accuracy,

despite not relying on the immediate speed history.

The ﬁrst methodology that learns from all the seg-

ments collectively proves to perform as well as the

second which just learns from each segment sepa-

rately, this shows us that models can generalize pretty

well over a road network given enough data and fea-

tures, and the models generalize well for the whole of

the road network.

5 THE CHALLENGE OF

CAPACITY PREDICTION

According to the Highway Capacity Manual (HCM),

the deﬁnition of road capacity is the maximum num-

ber of vehicles or amount of trafﬁc ﬂow that a

roadway can accommodate effectively and efﬁciently

within a given time period, taking into account fac-

tors such as lane width, number of lanes, geometric

design, signal timings, and other operational charac-

teristics (Reilly, 1997).

The capacity is calculated using the following for-

mula:

Capacity = MSFE * PHF * N * fhv * fp (Reilly,

1997)

Where: PHF = peak-hour factor which represents

the variation in trafﬁc ﬂow within an hour fHV = an

adjustment factor for the presence of “heavy” vehi-

cles fp = an adjustment factor to account for the fact

that all drivers of the facility may not be commuters

or regular users N = number of lanes in the given di-

rection of ﬂow MSFe = maximum possible ﬂow rate

we can have to still be at level of service e

The ﬁrst four variables are variables that can’t be

changed by the road planner, and the last the MSFE

is the maximum service ﬂow rate that is required to

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1166

Figure 2: HCM capacity table (Reilly, 1997).

maintain a certain speed. The values for these can be

found in the HCM table in ﬁgure 2:

With these guidelines the capacity doesn’t depend

on the amount of cars in the road or the previous

speeds and ﬂows observed on the road. Using ma-

chine learning algorithms can provide us with much

more accurate values for this and allows a system to

be dynamic and change the capacity according to the

exact current road conditions. On top of this, the ma-

chine learning algorithm allows us to ﬁnd optimize

the capacity for each of the roads, and ﬁnd the values

of the speed and ﬂow that allow for maximizing the

distance traveled on the road which we deﬁned as our

utility function, but this could be expanded to other

utility functions too.

For the sake of this research, we are going to de-

ﬁne the road’s capacity as the ﬂow that produces the

highest utility, where the utility function is the amount

of kilometres covered on the road, which is the mul-

tiplication of the speed and ﬂow. To estimate road

capacity, we developed a procedure based on the top

performing machine learning models built in the pre-

vious stage and used them to predict the speed under

different conditions. We built these models based on

both input sets - the ﬁrst including road metadata and

ﬂow features and the second that includes speed fea-

tures too, we did this with both methodologies used

in the previous section - i.e using all segments data

for prediction and prediction using only the individ-

ual segment.

To determine the optimal ﬂow for each road, we

adopted a two-stage approach. In the ﬁrst stage, we

created a list of optional ﬂows and speeds for each

road (depending on the input set), considering a max-

imum ﬂow of the road as the maximum ﬂow observed

in the dataset for that road. Subsequently, we utilized

the trained random forest model to predict the cor-

responding speeds for each of these ﬂows. A mini-

mum optional speed threshold was deﬁned, and the

ﬂow that maximized the utility function, which is the

Figure 3: Flow to speed graph for road 313114, prediction

using random forest with 5 different input sets and HCM.

product of ﬂow and speed i.e the total distance trav-

eled on the road, was selected as the optimal ﬂow.

We found the corresponding speeds for the differ-

ent values for the maximum ﬂow service rate that’s

shown in the HCM table in ﬁgure 2 using the dif-

ferent machine learning algorithms used in the pre-

vious chapters. Using the ﬁrst input set - the set that

includes the previous speeds proved to overﬁt to the

speed features and these algorithms didn’t generalize

well to the hypothetical situations. Because of this,

we tried to improve the second input set with other

features that don’t include the previous speed values.

We added the free ﬂow speed for each of the segments

that was calculated by taking the mean speed between

00:00 and 01:00 every day. This improved the MSE

to 25 and the R2 to 0.56, and in future work we want

to improve this even further, so that we can produce a

model that learns just from the ﬂow and constant vari-

ables that deﬁne the road and make sure that we don’t

use the speed features.

In the following table we can see an example of

a comparison between the capacity used by apply-

ing the HCM capacity equation and the random for-

est and KNN models using the 5 different models - 2

input sets for the individual based models and 3 for

the generalized based models explained in the previ-

ous section. We have assumed the PHF is 0.92 (Tarko

and Perez-Cartagena, 2005) and that no modiﬁcations

need to be made for heavy vehicles and that all drivers

are regular commuters. This has been done on a road

with a FFS of 74mph and we used the HCM guide-

lines for 120 kmh which is almost exactly equivalent.

The road we picked is road with the ID 313114. The

results for the random forest models are presented in

ﬁgure3 and the results for the KNN models are pre-

sented in ﬁgure 4.

Based on the observed outcomes, it is evident that

the utilization of speed features in the models leads to

overﬁtting to the particular road data, thereby hinder-

ing their ability to generalize effectively. Moreover,

the graph depicts a consistent trend where all models

fail to adequately capture trafﬁc dynamics, particu-

larly where the value of the ﬂow is high. This inade-

Machine Learning for Predicting Trafﬁc and Determining Road Capacity

1167

Figure 4: Flow to speed graph for road 313114, prediction

using KNN with 5 different input sets and HCM.

quacy may arise from the limited availability of rele-

vant data, high trafﬁc data isn’t as common as normal

trafﬁc data. To elucidate this phenomenon, we can

invoke the well-known concept that correlation does

not necessarily imply causation. In this context, the

models that incorporate speed features excel in cap-

turing correlation patterns but fall short in capturing

the underlying causal relationships.

We can further discern that the K-nearest neigh-

bors (KNN) model outperforms the random forest

model in the right region of the graph. This superi-

ority can be attributed to the KNN model’s ability to

learn from very similar examples, allowing it to ef-

fectively learn from rarer data instances, in contrast

to the random forest model, which learns from the

entirety of the data. In light of this, it becomes ev-

ident that the KNN model’s performance advantage

stems from its speciﬁc learning characteristics and its

ability to leverage similar and rare data patterns. The

advantage of the KNN is that it succeeds in learning

well even when the value of the ﬂow is high and it the

model doesn’t underﬁt. However, the compute time

for the test set is very high and also we can observe

that for the ﬁrst methodology the model doesn’t learn

the base speed well enough.

6 CONCLUSIONS

In this study, we aimed to estimate road capacity us-

ing machine learning and deep learning models based

on real-time data and speed predictions. Our regres-

sion models accounted for factors such as trafﬁc ﬂow

and other relevant features that impact speed. Rather

than projecting speciﬁc capacity thresholds, our mod-

els explored the interplay between ﬂow, speed, and

other variables, empowering road planners to deter-

mine optimal capacity for their desired speed. By

leveraging these models in combination with opti-

mization tools, we aimed to improve road manage-

ment and resource allocation in transportation sys-

tems. Our research introduces speciﬁcally a speed

prediction model using the ﬂow data.

To construct our models, we employed vari-

ous machine learning algorithms, including decision

trees, random forest, k-Nearest Neighbors (k-NN) re-

gressor, and linear regression. Through two distinct

methodologies, we investigated the effectiveness of

centralized and decentralized modeling strategies in

capturing the complexities of trafﬁc dynamics. Tem-

poral dependencies were considered by incorporating

lag features that integrated time-series information.

We systematically varied the input variables to evalu-

ate their inﬂuence on the predictive models, focusing

on road-speciﬁc information, speed lag features, and

the addition of free ﬂow speed. The evaluation of our

models was based on metrics such as Mean Squared

Error (MSE) and R2 score.

Based on our ﬁndings, we concluded that machine

learning and deep learning models, in conjunction

with optimization tools, hold promise in accurately

estimating road capacity. By considering the dynam-

ics of individual roads and incorporating real-time

data, our models provide insights into localized fac-

tors inﬂuencing road speed. This research contributes

to the ﬁeld of transportation systems by providing a

comprehensive road network model and showcasing

the potential of using machine learning for determin-

ing road capacity.

Future research endeavors are expected to signif-

icantly augment the existing model’s predictive ca-

pabilities by integrating a more comprehensive range

of external conditions. The inclusion of supplemen-

tary data sets, such as weather and spatio-temporal

data, holds tremendous promise for further enhanc-

ing the model’s performance. Combining ensemble

methods using decision trees together with neural net-

works has showed improvements even with tabular

data (Shwartz-Ziv and Armon, 2021) so building en-

semble models could improve results even without

adding anymore data. Moreover, incorporating the

structural layout of the road network and the inter-

dependence among roads by using spatial data and

graph neural network (GNN) architecture can po-

tentially elucidate the intricate connections between

nodes in the graph.

Furthermore, to comprehend the underlying tem-

poral structure, time series features will be integrated

into the model’s architecture. To capture the intri-

cacies inherent in the data and enable the model to

make precise predictions, the proposed research en-

deavors will utilize deep learning architectures such

as convolutional neural networks (CNNs), recurrent

neural networks (RNNs), and long short-term mem-

ory (LSTM). These architectures have demonstrated

impressive capabilities in analyzing time series data,

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1168

thereby enabling the model to extract relevant features

and make accurate forecasts. By integrating these

cutting-edge techniques into our model, we aim to en-

hance its predictive capabilities and offer valuable in-

sights into transportation planning and management.

Conversely, instead of using neural networks for the

temporal structure, as shown in our current ﬁndings

the random forest model performs well on time series,

and this can be combined together with the GNN for

the spatial structure as demonstrated in the following

research (Ivanov and Prokhorenkova, 2021).

Future research will incorporate spatial-temporal

models, aiming to encompass the entire network. Ad-

ditionally, advanced optimization algorithms, such as

genetic algorithms and simulated annealing, will be

employed to determine the optimal capacity for the

entire network dynamically at any given time. We

plan on using advanced optimization techniques such

as genetic algorithms and simulated annealing. Af-

ter incorporating these advanced optimization algo-

rithms, our future plan involves developing a simu-

lator that accurately replicates real trafﬁc data. This

simulator will serve as a platform to evaluate and

compare the performance of different optimization al-

gorithms.

ACKNOWLEDGEMENTS

This research was partially supported by the ministry

of Innovation, Science & Technology, Israel.

REFERENCES

Billings, D. and Yang, J.-S. (2006). Application of the arima

models to urban roadway travel time prediction - a

case study. In 2006 IEEE International Conference

on Systems, Man and Cybernetics, volume 3, pages

2529–2534.

Cao, M., Li, V. O., and Chan, V. W. (2020). A cnn-lstm

model for trafﬁc speed prediction. In 2020 IEEE 91st

Vehicular Technology Conference (VTC2020-Spring),

pages 1–5. IEEE.

Chen, C. (2002). Freeway performance measurement sys-

tem (PeMS). University of California, Berkeley.

Chung, E., Ohtani, O., Warita, H., Kuwahara, M., and

Morita, H. (2006). Does weather affect highway ca-

pacity? 5th International Symposium on Highway Ca-

pacity and Quality of Service, pages 139–146.

Ghosh, B., Basu, B., and O’Mahony, M. (2007). Bayesian

time-series model for short-term trafﬁc ﬂow fore-

casting. Journal of transportation engineering,

133(3):180–189.

Haghighat, A., Ravichandra-Mouli, V., Chakraborty, P., Es-

fandiari, Y., Arabi, S., and Sharma, A. (2020). Appli-

cations of deep learning in intelligent transportation

systems.

Huo, J., Wu, X., Lyu, C., Zhang, W., and Liu, Z. (2022).

Quantify the road link performance and capacity us-

ing deep learning models. IEEE Transactions on In-

telligent Transportation Systems.

Ivanov, S. and Prokhorenkova, L. (2021). Boost then

convolve: Gradient boosting meets graph neural net-

works. In International Conference on Learning Rep-

resentations.

Kumar, S. V. (2017). Trafﬁc ﬂow prediction using kalman

ﬁltering technique. Procedia Engineering, 187:582–

587.

Kyte, M., Khatib, Z., Shannon, P., and Kitchener, F. (2000).

Effect of environmental factors on free-ﬂow speed. In

Fourth International Symposium on Highway Capac-

ity, pages 108–119.

Lee, S. and Fambro, D. B. (1999). Application of subset

autoregressive integrated moving average model for

short-term freeway trafﬁc volume forecasting. Trans-

portation Research Record, 1678(1):179–188.

Liu, Y., Liu, C., and Zheng, Z. (2020). Trafﬁc conges-

tion and duration prediction model based on regres-

sion analysis and survival analysis. Open Journal of

Business and Management, 8(2):943–959.

Lonare, S. and Ravi, B. (2020). Trafﬁc Flow Prediction Us-

ing Regression and Deep Learning Approach, pages

641–648.

Mingheng, Z., Yaobao, Z., Ganglong, H., and Gang, C.

(2013). Accurate multisteps trafﬁc ﬂow prediction

based on svm. Mathematical Problems in Engineer-

ing, 2013.

Reilly, W. (1997). Highway capacity manual 2000. Tr

News, (193).

Shwartz-Ziv, R. and Armon, A. (2021). Tabular data: Deep

learning is not all you need. Inf. Fusion, 81:84–90.

Song, C., Lee, H., Kang, C., Lee, W., Kim, Y. B., and Cha,

S. W. (2017). Trafﬁc speed prediction under week-

day using convolutional neural networks concepts. In

2017 IEEE Intelligent Vehicles Symposium (IV), pages

1293–1298. IEEE.

Tamir, T. S., Xiong, G., Li, Z., Tao, H., Shen, Z., Hu, B.,

and Menkir, H. M. (2020). Trafﬁc congestion predic-

tion using decision tree, logistic regression and neural

networks. IFAC-PapersOnLine, 53(5):512–517.

Tarko, A. and Perez-Cartagena, R. (2005). Variability of

peak hour factor at intersections. Transportation Re-

search Record, 1920:125–130.

Tedjopurnomo, D. A., Bao, Z., Zheng, B., Choudhury, F.,

and Qin, A. K. (2020). A survey on modern deep

neural network for trafﬁc prediction: Trends, methods

and challenges. IEEE Transactions on Knowledge and

Data Engineering.

Tian, Y., Zhang, K., Li, J., Lin, X., and Yang, B. (2018).

Lstm-based trafﬁc ﬂow prediction with missing data.

Neurocomputing, 318:297–305.

Williams, B. (2008). Intelligent transport systems stan-

dards. Artech House.

Machine Learning for Predicting Trafﬁc and Determining Road Capacity

1169