A Method for Road Accident Prevention in Smart Cities based on Deep

Reinforcement Learning

Giuseppe Crincoli

, Fabiana Fierro

, Giacomo Iadarola

1 a

, Piera Elena La Rocca

, Fabio Martinelli

Francesco Mercaldo

3,1 b

and Antonella Santone

Institute of Informatics and Telematics, National Research Council of Italy (CNR), Pisa, Italy

Spike Reply, Milan, Italy

University of Molise, Campobasso, Italy

fa.ﬁerro@reply.it, {francesco.mercaldo, antonella.santone}@unimol.it

Keywords:

Machine Learning, Deep Reinforcement Learning, Smart Cities, Automotive, Artiﬁcial Intelligence.

Abstract:

Autonomous vehicles play a key role in the smart cities vision: they bring beneﬁts and innovation, but also

safety threats, especially if they suffer from vulnerabilities that can be easily exploited. In this paper, we

propose a method that exploits Deep Reinforcement Learning to train autonomous vehicles with the purpose

of preventing road accidents. The experimental results demonstrated that a single self-driving vehicle can help

to optimise trafﬁc ﬂows and mitigate the number of collisions that would occur if there were no self-driving

vehicles in the road network. Our results proved that the training progress is able to reduce the collision

frequency from 1 collision every 32.40 hours to 1 collision every 53.55 hours, demonstrating the effectiveness

of deep reinforcement learning in road accident prevention in smart cities.

1 INTRODUCTION

Smart city is an expression that recently has become

fashionable and it refers to the collective ideal of the

city of the future. The notion of smart city is related

to an urban area where the use of digital technologies

and technological innovation optimizes and improves

the infrastructures and services provided to citizens.

In the smart cities context, the so-called autonomous

vehicles (i.e., AVs) play a crucial role. As a matter of

fact, AVs are expected to be among the ﬁrst robotic

systems to enter and widely affect existing societal

systems (Wu et al., 2017a).

On the other hand, cybersecurity threats are

emerging hand in hand with technological advance-

ment, and they may affect the rapid diffusion of AVs.

Therefore, improving safety and reliability in AVs

is growing interest from both the industrial and the

academic research community, considering that the

global market for AVs is expected to rapidly grow in

the next decade (Statista.com, 2021; Martinelli et al.,

2021b).

As pointed out by Kathoun et al. (Khatoun and

https://orcid.org/0000-0001-7060-6233

https://orcid.org/0000-0002-9425-1657

Zeadally, 2017), the smart city architecture and tech-

nological innovation necessarily leads to open cyber-

security scenarios that involve new threats and chal-

lenges for security experts (Martinelli et al., 2021a;

Martinelli et al., 2018a). The wide adoption of In-

formation and Communications Technologies (ICT)

systems into vehicles has opened new possibilities

for attackers to access and attack cars. Nowadays,

we use smartphones to control the radio/GPS tracker,

these features can be exploited to control cars passing

through ICT systems such as Android applications.

Since it is possible to remotely control the behaviour

of a vehicle, we should design AV systems that can

automatically react to these attacks that cause devia-

tions from the expected driving behaviour.

In the light of the above considerations, this pa-

per proposes a method aimed to reduce the likeli-

hood of road accidents using vehicular network data

to smooth trafﬁc ﬂows and control the behaviour of

the drivers. In detail, we are interested in understand-

ing whether a Deep Reinforcement Learning (DRL)

trained vehicle inserted in the road network can pre-

vent road accidents. We assume that ’vehicles under

cyberattacks’ have unwanted aggressive and unsafe

driving. The study poses the following research ques-

Crincoli, G., Fierro, F., Iadarola, G., Rocca, P., Martinelli, F., Mercaldo, F. and Santone, A.

A Method for Road Accident Prevention in Smart Cities based on Deep Reinforcement Learning.

DOI: 10.5220/0011146500003283

In Proceedings of the 19th International Conference on Security and Cryptography (SECRYPT 2022), pages 513-518

ISBN: 978-989-758-590-6; ISSN: 2184-7711

513

tion:

• RQ: Is it possible to increase driving safety by in-

troducing in a road network a vehicle trained by

DRL?

The paper proceeds as follows: in Section 2, we

present the proposed method; in Section 3, we dis-

cuss the results of the experimental evaluation; cur-

rent state of the art is discussed in Section 4; ﬁnally,

in last section conclusion and future research plans

are provided.

2 A METHOD FOR PREVENTING

ROAD ACCIDENTS

In this paper, we propose a method that, using DRL,

is dedicated to prevent trafﬁc accidents in smart cities

when some running vehicles are under cyberattacks.

Figure 1 illustrates the workﬂow of the proposed

method. More in detail, the security analyst uses Flow

to implement its DRL algorithm and interfaces with

the TraCI APIs that are used to interact - both inbound

and outbound - with the trafﬁc managed by the SUMO

simulator.

Figure 1: The proposed method for avoiding trafﬁc inci-

dents in smart cities.

We applied the proposed method in a simulated

scenario where vehicles are driven on several lanes,

taking into account, in addition to sudden acceler-

ations, the cases in which the sudden lane-change

may cause inconvenience or collisions, since lane-

changing is considered one of the main causes of road

accidents (Wu et al., 2017b). The rationale behind

the proposed method is related to accident prevention

in the AVs context, by considering the application of

DRL to train vehicles on how to behave to avoid sud-

den collisions. To demonstrate the effectiveness of the

proposed method, we considered a simulated environ-

ment involving several vehicles, in which it is possi-

ble to intervene on the simulation causing vehicles to

make real-time decisions. Attacks are conﬁgured in

aggressive driving caused, purposely, by human deci-

sions. As an evidence, there is the possibility to exter-

nally set the trajectory and the speed of the individual

vehicles and above all the possibility of changing ve-

hicles’ speed during the simulation. This applies both

to the self-driving vehicle trained with DRL and to

vehicles under attack. Furthermore, it is worth not-

ing that our methodology does not take into account

a speciﬁc cyberattack on the vehicles, but instead it is

a countermeasure to any possible one that interferes

with the usual driving style. Indeed, we assume that

an attacked vehicle will start acting aggressively (such

as invading lines), and then we study how all the other

vehicles in the simulation can react to this behaviour

to mitigate the collisions and damages.

Our method uses SUMO as a server that is run-

ning on a remote port waiting for clients to connect. A

client interacts with the server to cause actions on the

entities in the simulation. The TraCI APIs are used to

intervene at run-time on the running simulation (e.g.

to make a car accelerate, brake, change lane, etc.).

The analyst or security expert interfaces with Flow to

implement its DRL methodology, allowing them to in-

teract with trafﬁc (Figure 1). At run-time, depending

on the effects of the actions performed by the vehi-

cles, each action of a vehicle is given a reward when

it is considered positive, otherwise a penalty when the

action is considered negative. It’s important stressing

that SUMO creates a model that is collision-free by

nature and this makes it impossible to simulate road

accidents. Thus, in order to let vehicles learn that ac-

tions that lead to collisions are negative, we had to

reduce to 0 the minimum safety distance between ve-

hicles to let them crash. This change within the simu-

lation increased the probability of causing cars’ colli-

sions.

Interacting with the simulation is the way to cause

actions that can lead to rewards and penalties. For

the sake of clarity, consider that suddenly increasing

vehicles’ speeds and/or making them lane-change can

probably let vehicles collide. These actions can be un-

wanted if they are the result of vehicular cyberattacks

on the cars within the simulation. In that scenario, the

vehicles that performed bad actions that caused col-

lisions receive a penalty aimed at discouraging them

from repeating those actions again. After performing

several simulations, it was clear that - at each simula-

tion cycle - the execution of the algorithm improved

the performance of the vehicles even if some cyberat-

tacks were performed on some vehicles making them

perform some dangerous actions, since generally all

the vehicles are able to learn from good and wrong

actions performed in the past.

In order to design a method to prevent road ac-

cidents we developed a network conﬁguration cus-

tomizing the original single-lane ring network de-

SECRYPT 2022 - 19th International Conference on Security and Cryptography

514

Figure 2: Our four-lanes network (19 ”human” vehicles and

1 DRL vehicle).

scribed in the paper by Vinitsky et al. (Wu et al.,

2017b) and that is one of the benchmarks designed

in ﬂow. The original ring network provided by Flow

(Wu et al., 2017b) simply consists of a circular lane

with a ﬁxed length. Our personalized ring network ex-

tended the original single-lane environment by adding

three lanes: we created a four-lane ring road network

that is a natural extension to problems involving a

single-lane ring in order to consider a more realistic

multi-lane scenario. Basically, it is a loop road gen-

erated by specifying the number of lanes and a ring

circumference, in this case personalized starting from

the original. In our four-lanes ring network were in-

serted 20 vehicles (19 vehicles driven by humans and

1 guided by DRL) in order to cause cyberattacks on

5 different vehicles (25% of the total number of vehi-

cles).

Once the network is created, the initial position-

ing of the vehicles is random on multiple lanes. Each

vehicle has a starting speed of 60 km/h. Each vehi-

cle is 5 meters long and follows the Intelligent Driver

Model (IDM) (Sutton and Barto, 2018) dynamics

with parameters speciﬁed by Martin Treiber and Arne

Kesting (Treiber and Kesting, 2013). In trafﬁc ﬂow

modeling, the IDM is a time-continuous car-following

model for the simulation of freeways and urban traf-

ﬁc. The proposed network also relies on additional

parameters. Despite the default timestep

duration is

1 second for SUMO, in our case each step lasts 0.1

seconds, as this is the default in Flow. Though the

speed limit is 36.1 m/s (maximal limit of Italian high-

ways, just as baseline), the vehicles can reach very

high velocities because of their maximum possible ac-

celerations and decelerations that are set to 30 m/s

that is the highest acceleration possible as it is reached

by a supercar

. Vehicles should be allowed to reach

high accelerations in order to increase the possibili-

ties to let them be victims of cyberattacks consisting

of sudden increasing/decreasing a vehicle’s accelera-

tion. Also, in order to give the model the possibility

to cause collisions when hazard, dangerous and un-

A timestep is the sampling frequency

Ferrari Daytona SP3 0 to 100 km/h in 2.85 seconds

safe maneuvers are performed, we had to set the min-

Gap

= 0 and the minGapLat

= 0. At this point,

vehicles were free to crash - both latitudinally and

longitudinally - because they were not blocked any-

more by SUMO at certain distances. In fact, since

SUMO is a collision-free model, if it is not forced to

reduce the minimum gap between vehicles to 0, ve-

hicles cannot crash, despite the presence of reckless

vehicles and maneuvers. Not only SUMO, but also

Flow supplement its car following models with safe

driving rules that prevent the inherently unstable car

following models from crashing (Wu et al., 2017b).

As SUMO terminates its experiments when a col-

lision occurs, Flow provides a more conservative

course of action that is a fail-safe mechanism, called

the ﬁnal position rule. The ﬁnal position rule forces

each vehicle to keep a velocity such that if the preced-

ing vehicle suddenly starts braking with max deceler-

ation a, then even if the following vehicle has a delay

τ, it can still slow down such that it comes to rest at

the ﬁnal position of the rear bumper of the preceding

vehicle (Wu et al., 2017b). Based on SUMO, Flow

leverages various safety features from SUMO in order

to prevent longitudinal and lateral collisions. These

fail-safes serve as bounds on the accelerations and

lane-changes human and autonomous vehicles may

perform and may be relaxed on any set of vehicles

in the network to allow for the prospect of more ag-

gressive actions to take place, as we did in our case

study.

The action space containing the set of actions a

vehicle is allowed to perform is made of accelera-

tions and lane-changing. In fact, for tasks with k au-

tonomous vehicles, the action space is a set of acceler-

ations c ∈ Q

and lane-changing decisions d ∈ [−1, 1].

This is because when following a predeﬁned route,

a vehicle performs longitudinal and lateral actions,

where longitudinal is meant accelerations and lateral

is meant lane-changing. The lane-changing values are

rounded to the nearest integer in (-1, 0, 1) denoting

lane-change left, don’t lane-change and lane-change

right, respectively; this keeps the action space repre-

sentation continuous. In our work, we inserted both

accelerations and lane-changes in our action space be-

cause we built a multi-lane ring network where, in ad-

dition to accelerations, lane-changes are allowed.

The observation space may be any set of state in-

formation the user wishes to provide to the agent to let

the agent fully or partially know the state of the envi-

ronment. In our experimental work, the DRL vehicle

By minGap we mean the distance between 2 vehicles

longitudinally

By minGapLat we mean the distance between 2 vehi-

cles latitudinally

A Method for Road Accident Prevention in Smart Cities based on Deep Reinforcement Learning

515

follows the partially observed setting, that is observ-

ing only the velocity of the autonomous vehicle, the

velocity of its preceding vehicle and its relative posi-

tion in respect to the preceding vehicle. At the state

of the art, the reward function can be any function

of vehicle speed, position, fuel consumption, acceler-

ation, distance elapsed, etc. (Wu et al., 2017b; Mar-

tinelli et al., 2018b). As researchers working on a new

scenario, we desired a different reward function from

the predeﬁned ones. After a long phase of trial and

error, we came up with a ﬁnal reward function that

mixes different parameters we want to keep an eye

on. Our system-level objective was mitigating the oc-

currences of collisions when cyberattacks happen in

the network and we followed the road of creating cy-

berattacks while letting the autonomous vehicle (AV)

learn how to mitigate collisions. More speciﬁcally,

our reward function gives a positive reward if the ve-

locity of the vehicle is near the averages velocities of

the whole set of vehicles and if it is under the target

velocity, while a penalty if the vehicle is accelerating

and lane-changing too much.

In our experimental analysis, we tried to improve

the trafﬁc scenario when some cyberattacks are per-

formed on some vehicles in the ring network. After

different simulations involving different cyberattacks,

we used Flow to implement a DRL algorithm able to

stabilize the trafﬁc ﬂows and reduce the number of

collisions in time, despite several cyberattacks hap-

pening at each step. We freely release for research

purposes the Python scripts we developed and the

SUMO scenario

3 EXPERIMENTAL ANALYSIS

Below we provide the answers for RQ.

In Figure 3 we can see that after 3.500.000

timesteps (0.1 s) corresponding to 97.22 hours, only

3 collisions happened. For this reason, the collision

frequency is 1 collision every 32.40 hours.

Figure 3: RQ. 3.500.000 timesteps of 0.1 s resulted in 3

collisions.

https://mega.nz/ﬁle/wNkmXIiY#oiPCV2ZUCQSmu5t

cgez2JnWU8TA89GPzNanM0owx7fM

In Figure 4 we can see that after 7.000.000

timesteps (0.1 s) corresponding to 194.44 hours, only

5 collisions happened. For this reason, the collision

frequency is 1 collision every 38.89 hours.

Figure 4: RQ. 7.000.000 timesteps of 0.1 s resulted in 5

collisions.

In Figure 5 we can see that after 10.594.600

timesteps (0.1 s) corresponding to 294.29 hours, only

6 collisions happened. For this reason, the collision

frequency is 1 collision every 49.04 hours.

Figure 5: RQ: 10.594.6005 timesteps of 0.1 s resulted in 6

collisions.

Similarly to the previous trial, in this case in Fig-

ure 6 we can see that after 10.615.500 timesteps (0.1

s) corresponding to 294.88 hours, only 6 collisions

happened. For this reason, the collision frequency is

1 collision every 49.15 hours.

Figure 6: RQ: 10.615.500 timesteps of 0.1 s resulted in 6

collisions.

In Figure 7 we can see that after 21.659.500

timesteps (0.1 s) corresponding to 601.65 hours, only

10 collisions happened. For this reason, the collision

frequency is 1 collision every 60.16 hours.

SECRYPT 2022 - 19th International Conference on Security and Cryptography

516

Figure 7: RQ: 21.659.500 timesteps of 0.1 s resulted in 10

collisions.

In Figure 8 we can see that after 32.770.035

timesteps (0.1 s) corresponding to 910.28 hours, 17

collisions happened. For this reason, the collision fre-

quency is 1 collision every 53.55 hours. The over-

all collision frequency is good, but in this case we

noticed a collisions phase after a no collisions phase

from timestep 15.000 because of the worsening of the

mean reward reached by the model.

Figure 8: RQ: 32.770.035 timesteps of 0.1 s resulted in 17

collisions.

Table 1: Results related to RQ (19 human vehicles and 1

DRL vehicle).

Trial Steps(0.1s) Hours Collision Collision frequency

1 3.500.000 97.22 h 3 1 co1/32.40 h

2 7.000.000 194.44 h 5 1 co1/38.89 h

3 10.594.600 294.29 h 6 1 co1/49.04 h

4 10.615.500 294.88 h 6 1 col/49.15 h

5 21.659.500 601.65 h 10 1 col/60.16 h

6 32.770.035 910.28 h 17 1 co1/53.55 h

In table 1, results for each simulation are summa-

rized, showing that - with the training progress - the

collision frequency was reduced from 1 collision ev-

ery 32.40 hours to 1 collision every 53.65 hours.

3.1 Discussion

Our results showed that just 1 DRL vehicle can help to

optimize trafﬁc ﬂows and mitigate the number of col-

lisions that would happen if no autonomous vehicles

were inserted in the network.

Answering RQ, we showed that the collision fre-

quency decreases while the number of simulation

steps is increasing: this means that the DRL agent

learns from the environment while bad behaviours

happen on some vehicles. In fact, observing the re-

sults related to the training iterations, we can see

that collision frequencies range from 1 collision ev-

ery 32.40 hours (3.500.000 steps) to 1 collision ev-

ery 53.55 hours (32.770.035 steps). The ﬁfth trial re-

lated to 21.659.500 steps that resulted in a collision

frequency of 1 collision every 60.16 hours is a good

result but can be seen as an outlier because it repre-

sents the only case in which the next trial (trained on

more steps) performs worse. Generally, the model ex-

hibits a consistent and expected behaviour over time.

4 RELATED WORK

In this section, related works are discussed.

The statistics by the National Safety Council

(NSC) of the United States reported that more

than 40.000 roadway fatalities happened in 2017

(BOMEY, 2018). The primary reason for these road

tragedies is the human factor, where reckless driving

is the most considerable one (Saiprasert and Pattara-

Atikom, 2013), (Yu et al., 2016).

In 2015, two cybersecurity researchers published

a report (Miller and Valasek, 2015) showing how

someone can wirelessly control a Jeep Cherokee af-

ter shattering the vehicle’s Uconnect system.

In 2011, researchers at the University of Califor-

nia at San Diego and at the University of Washing-

ton found ways into a Chevy Impala’s innards that in-

cluded everything, e.g OnStar connection to a hacked

smartphone connected to its infotainment system via

Bluetooth (Drozhzhin, 2015).

Martinez et al. (Sikander and Anwar, 2018)

showed that the driving state such as velocity and ac-

celeration can be used to determine the driving per-

formance. Based on the corrected results, the sys-

tem may provide either passively or actively correc-

tive feedback to the driver. A sort of feedback mecha-

nism is involved also in DRL through the reward func-

tion and the progressive learning by the model. In our

case, relying on DRL, feedback is implicit and auto-

matic: the vehicles in the network are able to adapt

to what’s happening, so reckless behaviours are miti-

gated by the DRL vehicles themselves.

5 CONCLUSION AND FUTURE

WORKS

In this paper, we propose the application of DRL in

the context of autonomous driving. At each step of

our experiment, we cyberattacked some vehicles in

A Method for Road Accident Prevention in Smart Cities based on Deep Reinforcement Learning

517

the network to let them behave in an unexpected and

erroneous way, while monitoring if an autonomous

vehicle trained with a customized DRL algorithm was

able to minimize the number of road accidents (colli-

sions) over time.

The experimental analysis results showed that

only 1 DRL vehicle can help to optimize trafﬁc ﬂows

in order to mitigate the number of collisions that re-

alistically would happen if no autonomous vehicles

were inserted in the network. We showed that the col-

lision frequency decreases while the number of steps

increases: this means that the DRL agent is able to

learn from the environment despite the presence of

bad behaviours.

As future work, we plan to add an increasing num-

ber of DRL vehicles in the network with the aim to

create more complex scenarios to evaluate the perfor-

mances of the proposed DRL model based on a reward

function that penalizes strong accelerations and lane-

changes, while rewards velocities under a target ve-

locity and near to the mean velocity of all the vehicles

in the network. Also, we would like to execute a no-

DRL scenario with the same network conﬁguration in

order to compare DRL performance with no-DRL per-

formance.

ACKNOWLEDGEMENTS

This work has been funded by Spike Reply and par-

tially supported by MIUR - SecureOpenNets, EU

SPARTA, CyberSANE and E-CORRIDOR projects.

REFERENCES

BOMEY, N. (2018). U.S. vehicle deaths topped 40.000

in 2017, National Safety Council estimates. https:

//eu.usatoday.com/story/money/cars/2018/02/15/

national-safety-council-trafﬁc-deaths/340012002/.

Accessed: Nov-2021.

Drozhzhin, A. (2015). Black hat usa 2015: The

full story of how that jeep was hacked.

https://www.kaspersky.com/blog/blackhat-jeep-

cherokee-hack-explained/9493/. Accessed: Nov-

2021.

Khatoun, R. and Zeadally, S. (2017). Cybersecurity and pri-

vacy solutions in smart cities. IEEE Communications

Magazine, 55(3):51–59.

Martinelli, F., Marulli, F., Mercaldo, F., and Santone, A.

(2021a). Neural networks for driver behavior analysis.

Electronics, 10(3):342.

Martinelli, F., Mercaldo, F., Nardone, V., Orlando, A., and

Santone, A. (2018a). Cluster analysis for driver ag-

gressiveness identiﬁcation. In ICISSP, pages 562–

569.

Martinelli, F., Mercaldo, F., Nardone, V., Orlando, A., and

Santone, A. (2018b). Who’s driving my car? a ma-

chine learning based approach to driver identiﬁcation.

In ICISSP, pages 367–372.

Martinelli, F., Mercaldo, F., Nardone, V., and Santone, A.

(2021b). Driver identiﬁcation through formal meth-

ods. IEEE Transactions on Intelligent Transportation

Systems.

Miller, C. and Valasek, C. (2015). Remote exploitation of an

unaltered passenger vehicle. Black Hat USA, 2015(S

91).

Saiprasert, C. and Pattara-Atikom, W. (2013). Smartphone

enabled dangerous driving report system. In 2013 46th

Hawaii International Conference on System Sciences,

pages 1231–1237. IEEE.

Sikander, G. and Anwar, S. (2018). Driver fatigue detection

systems: A review. IEEE Transactions on Intelligent

Transportation Systems, 20(6):2339–2352.

Statista.com (2021). Projected size of the global

autonomous car market from 2019 to 2023.

https://www.statista.com/statistics/428692/projected-

size-of-global-autonomous-vehicle-market-by-

vehicle-type/. Accessed: Nov-2021.

Sutton, R. S. and Barto, A. G. (2018). Reinforcement learn-

ing: An introduction. MIT press.

Treiber, M. and Kesting, A. (2013). Trafﬁc ﬂow dynamics.

Trafﬁc Flow Dynamics: Data, Models and Simulation,

Springer-Verlag Berlin Heidelberg.

Wu, C., Kreidieh, A., Parvate, K., Vinitsky, E., and

Bayen, A. M. (2017a). Flow: A modular learning

framework for autonomy in trafﬁc. arXiv preprint

arXiv:1710.05465.

Wu, C., Kreidieh, A., Parvate, K., Vinitsky, E., and Bayen,

A. M. (2017b). Flow: Architecture and benchmark-

ing for reinforcement learning in trafﬁc control. arXiv

preprint arXiv:1710.05465, page 10.

Yu, J., Chen, Z., Zhu, Y., Chen, Y., Kong, L., and Li,

M. (2016). Fine-grained abnormal driving behaviors

detection and identiﬁcation with smartphones. IEEE

transactions on mobile computing, 16(8):2198–2212.

SECRYPT 2022 - 19th International Conference on Security and Cryptography

518