The first and third plots in Figure 4 show that the
predictions of these two junctions generally follow
the trend of the actual values with an apparent
deviation. The model had difficulty capturing the
patterns at both junctions 2, 4, as great deviations
were illustrated in the plots. The model was able to
capture some temporal dependencies, but it is clearly
not as competitive as the others, which may be due to
the lack of data and information.
3.5 Model Comparison
Table 7: The average results of experiment models.
RMSE MAE
𝑅
RF 0.133482 0.102010 0.794802
XGBoost 0.125758 0.096456 0.798921
LSTM 0.218660 0.171947 0.622237
It can be noticed from Table 7 that XGBoost
outperformed both LSTM and RF in terms of all three
metrics. It had the lowest errors on average across all
junctions - RMSE of 0.125758 and MAE of
0.096456. Also, it yielded the highest 𝑅
value of
0.798921, slightly higher compared to the Random
Forest model. Therefore, from these results, it appears
that the XGBoost model can accurately predict and
explain the variance in traffic data across various
junctions.
The RF model also performed very well, with
slightly lower values in all the key metrics. It
achieved an average RMSE of 0.133482, MAE of
0.096456 and 𝑅
of 0.794802. Additionally, the
model could provide the interpretation of the
importance of features, which is also an important
indicator for model development. Thus, this model is
also a powerful model for traffic prediction.
Although LSTM is a robust neural network that is
particularly suitable for time series prediction, it did
not perform as well as the other two models. The
LSTM had the lowest 𝑅
value of 0.622237, the
highest RMSE of 0.21866 and MAE of 0.171947. In
this context of traffic prediction, LSTM was less
accurate in predicting traffic counts, especially in
junctions 2 and 4, with shallow 𝑅
values. Therefore,
the model probably requires further tuning or
additional features and information to compete with
XGBoost and RF.
In summary, XGBoost is the best-performing
model overall. It has the most accurate prediction
results and can capture the variance in the traffic data.
These properties make this model the most reliable
and suitable choice for traffic prediction in this study.
4 LIMITATIONS
This study demonstrated the use of machine learning
models like XGBoost in predicting traffic. However,
there are some limitations during the project that
should be noted. First, there is a lack of information
in the dataset used. The dataset only contained a few
features: the datetime and target variable – vehicle
count across various junctions, while more factors
should be considered, such as weather conditions,
road closures, public events, holiday. These might be
the reasons that affect the models to explain variance
in traffic data fully. Therefore, expanding the dataset
and merging additional features could enhance the
models’ performance.
Another limitation is the performance shown by
the LSTM model, which is a preferred choice when
dealing with time series data. LSTM performed
poorly in this context, which may be attributed to
insufficient tuning or data analysis and preprocessing
specific to the needs of LSTM.
Additionally, the models were trained on data
collected from a specific city. The generalizability of
the findings to other cities, locations with different
traffic patterns, and population numbers was not
tested. Thus, including other locations in the dataset
could help the models understand traffic data more
comprehensively.
Lastly, many Machine Learning models, such as
XGBoost, are black-box models, which are inherently
complex, so making it difficult to interpret the results
and understand how the models obtain the
predictions. This “black-box” nature can be a
significant defect, especially in real-time traffic
management systems, where interpretability is
crucial. Adopting methods like SHAP (Shapley
Additive exPlanations) values could help to address
this issue, as they can provide a comprehensive
understanding of feature importance, thus
determining which factors contribute the most to the
model’s decision-making process.
5 CONCLUSIONS
This paper primarily explored and analyzed the uses
of machine learning models – LSTM, RF, and
XGBoost in the prediction of traffic patterns at
different junctions in a city. The results showed that
XGBoost is the most effective and suitable model for
this context since it held the minimum prediction
errors and the maximum 𝑅
values among all the
models studied. RF also excelled in this task, with