A Practical Evaluation Method for Misbehavior Detection
in the Presence of Selfish Attackers
Marek Wehmer
a
and Ingmar Baumgart
FZI Research Center for Information Technology, Karlsruhe, Germany
Keywords:
Evaluation Metric, Misbehavior Detection, VANET, Vehicular Network, Cyber Physical Systems Simulation.
Abstract:
With recent deployment activities of Vehicle-to-X systems, the need for practical misbehavior detection is
growing. The academic discussion on related topics has progressed in the last years and delivered new evalua-
tion approaches. Most research however concentrates on evaluations based on the confusion matrix of packet
classifications, such as the precision-recall graph. We show that his approach has fundamental limitations and
does not allow to derive valid statements about the real-world impact of a scheme. After reviewing the state of
the art, we show that the physical manifestation of attacks must be considered when evaluating misbehaviour
detection systems. We propose a shift of perspective towards an attacker-oriented evaluation and contribute a
new metric for selfish attackers based on the physical impact of the attack. We further present a simulation
framework to practically evaluate misbehavior detection systems.
1 INTRODUCTION
The roll-out of Vehicle-to-X communications is gath-
ering momentum. As a means of greatly reducing
traffic accident numbers and efficiency, expectations
are high. Inter-vehicular communication will impact
safety relevant control systems and must therefore be
considered a critical resource in vehicles. Security
considerations play an important role in the design
and implementation of such systems: Attacks and
mitigation strategies have been a research topic for
the last decade. The decentralized nature and latency
requirements imply particular challenges for the de-
sign of security mechanisms. Messages conform to a
small set of fully specified formats and convey a claim
about the physical world. Moreover, all messages are
signed by authorized participants, but not encrypted.
The signature implies a certain degree of trust in the
received message, but cannot be fully relied upon: As
in any public key infrastructure, secret keys can leak
and be abused by malicious actors to send mislead-
ing information. But even bugs and faulty sensors
can lead to wrong claims being signed and sent out
by legitimate participants of the network. In this set-
ting, misbehavior detection fulfills a critical task: in-
correct information must be filtered out, while cor-
rect information must be trusted to prevent accidents
and increase traffic efficiency. One building block of
a
https://orcid.org/0000-0001-5155-8934
maintaining safety properties is to filter out packets
with untruthful information. Misbehavior detection
systems are designed to recognize such attacks and
mitigate their impact.
Evaluation of misbehavior detection schemes has
been a tedious task and proposed schemes have usu-
ally been assessed with specific simulation scenarios
or analytically, because real-world evaluation is dif-
ficult and infeasible in many cases. Many contribu-
tions use the false-positive rate (FPR) and the false-
negative rate (FNR) of classifications of received
packets to evaluate the quality of schemes. Recent
advances in evaluation methodology have improved
the comparability between schemes by using a com-
mon reference dataset and also set the path towards
better metrics. Those metrics are however still based
on the confusion matrix, and therefore refer to the bi-
nary classification performance of all received pack-
ets during a simulation run. This approach has fun-
damental limitations in some cases. For example, the
relevance of misclassifications is not necessarily dis-
tributed equally among participants. For some vehi-
cles, a specific packet sent by the attacker might just
not have any effect, while it can be critical for the
safety of another. Some attackers may also try to send
a great amount of packets to convince a specific vehi-
cle of one misinformation. For their success, it does
not necessarily matter if their target accepts all the
packets or just one.
Wehmer, M. and Baumgart, I.
A Practical Evaluation Method for Misbehavior Detection in the Presence of Selfish Attackers.
DOI: 10.5220/0010451005290537
In Proceedings of the 7th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2021), pages 529-537
ISBN: 978-989-758-513-5
Copyright
c
2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
529
In order to understand why it is impossible to
comprehensively assess misbehavior detection with
the confusion matrix, a new perspective on misbehav-
ior is needed. Attackers sending misleading or incor-
rect information about the state of the world are not
trying to get as many packets accepted as possible—
their goal is to alter the state of the physical world in
their favor. The success of their attack and in conse-
quence the ability of any misbehavior detection sys-
tem (MDS) to mitigate this attack can thus only be
measured by considering the effect of the communi-
cation on the physical world. This aspect has rarely
been considered and requires new tools to interac-
tively simulate cooperative intelligent transport sys-
tems (C-ITS) in the presence of attackers.
2 RELATED WORK
Attacks and misbehavior detection in vehicular net-
works have been studied extensively in the last
years (van der Heijden et al., 2019; van der Heijden,
2018). Many approaches of detecting attacks have
been published, but most focus on detecting very spe-
cific attacks. As such, results are difficult to com-
pare and usually authors do not share a common eval-
uation approach or even metrics that allow for com-
parison. Moreover, the implementation is often not
available, which makes reproducing results difficult.
Recently, the VeReMi dataset containing communi-
cation traces of different attacks being conducted in
the LuST (Codeca et al., 2015) traffic scenario has
been published and extended to improve evaluation
quality in the field and provide a basis for compari-
son (van der Heijden et al., 2018; Kamel et al., 2020).
The simulation code is based on the Veins vehicular
network simulator (Sommer et al., 2011) and publicly
available.
The choice of metrics for evaluating the quality
of mechanisms is a related topic that is being dis-
cussed in the community. The authors of VeReMi
propose to use the precision / recall graph of all re-
ception events (van der Heijden et al., 2018) instead
of the FPR / FNR metrics that have been used by
many works before. While such metrics are easy
to obtain in simulations, the numbers are difficult to
interpret and even though they seem applicable for
most data-centric approaches and suggest comparison
to some extent, not all mechanisms can be measured
accurately. The evaluation depends on further sub-
tleties, such as the aggregation method and a defini-
tion of when a message is to be considered as mali-
cious (van der Heijden and Kargl, 2017), which fur-
ther weakens comparability between mechanisms. To
take the dispersion of errors in detection performance
between participants into account, the gini-index of
the FPRs / FNRs of different vehicles has been pro-
posed (van der Heijden et al., 2018). While differ-
ences in classification performance between vehicles
can be described with the gini-index, the underlying
question of the individual importance of packets and
vehicles remains open. The similarity between C-ITS
and cyber physical systems (CPS) has been noticed
and discussed before in the context of misbehavior
detection (van der Heijden et al., 2016). However,
the discussion did not consider the physical part and
mostly separated both domains. Application behavior
metrics have been used to analyze the physical im-
pact of attacks for specific applications like cooper-
ative adaptive cruise control (CACC) (van der Heij-
den, 2018), but have not been explored further due to
concerns about dependencies on specific implementa-
tions (van der Heijden and Kargl, 2017). This concept
is related to our approach, and we generalize its appli-
cation and systematically show why these metrics are
crucial for the assessment of MDS.
3 A NEW SECURITY MODEL
FOR C-ITS
In the literature, several attacker models are used and
attackers are usually described by their intention and
their capabilities. The intentions and capabilities of
possible attackers differ greatly, but a single attacker
can perform multiple attacks according to his capa-
bilities. While the individual misbehavior detection
mechanisms mostly focus on detecting specific at-
tacks, the discussion about evaluation metrics for mis-
behavior detection tries to find a generic set of metrics
that are independent of the mechanism but also the
attacker model. One property common to most (but
not all) attackers is that they send packets, and this is
what recent evaluation metrics are based on: the clas-
sification of incoming packets into legitimate and ma-
licious. Metrics based on the confusion matrix of the
classification have to assume the quality of a detection
mechanism is related to the number of correct classi-
fications in some form and that mechanisms with a
better classification are better at detecting attacks.
We argue that this approach is too broad and while
such metrics are applicable to most relevant attacks
and detection mechanisms, the underlying assump-
tion does not hold in many cases and therefore the
effectiveness of mechanisms is not sufficiently re-
flected. This is substantiated when thinking about
the success of attacks: an attack is successful if the
attacker’s goal is achieved. The hypothesis that the
VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems
530
degree of which the attacker’s goal is achieved is
strongly correlated to a function of the classification
of received packets by other participants is at least not
trivially justified. We call this hypothesis H
cm
. On
the contrary, we already provided some indications of
why H
cm
does not hold in practice in Section 1. In the
remainder of this section, we present a new model to
discern attackers for which H
cm
holds and attackers
for which it does not.
3.1 Limitations of the Confusion Matrix
Confusion-matrix based metrics such as the FPR and
FNR have been used to evaluate network intrusion de-
tection systems (IDS). Traditional computer networks
and C-ITS share a number of attack vectors, i.e. at-
tackers might try to gain code execution privileges or
crash systems by sending packets that exploit bugs in
the parsing code. They might also try to exfiltrate in-
formation or deny operation to the network itself by
attacking flaws in application logic or network infras-
tructure. We refer to these attackers as network at-
tackers. Because traditional computer networks in-
volve many different applications running on general-
purpose computers communicating using many dif-
ferent protocols, assuming specific intentions or goals
of attackers is unhelpful and therefore not included in
the attacker model. In classic networks based on the
internet protocol stack, a one-to-one-communication
is possible, meaning the attacker only communicates
directly with his targets. In this setting, the relation
between a received network packet (or a series of re-
ceived network packets) and an attack is sufficiently
close and IDS performing better in the confusion ma-
trix are therefore likely to be better at detecting at-
tacks. For network attackers, we can assume H
cm
.
In C-ITS, network communication has a well-
defined function. Network packets are broadcast over
geographic areas and use a very small set of publicly
specified protocols to distribute information about the
physical world. In contrast to classical networks, the
meaning of each packet is known. The implementa-
tion of involved applications may be proprietary, but
their purpose is well-defined and as such assumptions
about the effect of received packets can be made. In
this setting, we face a different type of attacker that
we call a physical attacker. The physical attacker is
characterized by his intention of altering the state of
the world in some aspect. To reach his goal, he tries
to convince other vehicles to behave in a certain way
by sending packets that conform to the specification,
but may contain false information. We argue that H
cm
does not hold for the physical attacker for two rea-
sons.
1. The broadcast of packets in C-ITS implies that
irrelevant participants will receive the packet.
These participants do not contribute to the realiza-
tion of the attacker’s intention and may not even
act on the new information they receive, because
it only affects the intended target of the attacker.
Even though classification performance for those
vehicles is irrelevant for both the recipient and the
attacker, it contributes to the confusion matrix.
2. When the same packet with false information is
received by a target of the attacker, the result-
ing impact can be different. Suppose an attacker
tries to provoke an accident by faking an end-of-
queue warning. Two vehicles erroneously clas-
sify the warning as legitimate, where one of them
is near the alleged end-of-queue situation and the
other is not. The vehicle nearby the fake end-of-
queue will be more likely to trigger a dangerous
emergency break maneuver than the vehicle fur-
ther away. Both packet reception events have the
same influence on the confusion matrix and both
belong to the same attack.
There is no obvious relation between the success of
an attack and the number packets received by other
participants. When considering physical attackers, we
require a new set of metrics that do not depend on H
cm
and instead reflect the ability of a MDS to mitigate an
attack.
3.2 Embracing the Physical World
Metrics based on the confusion-matrix still have a
valid application in C-ITS, since classical network at-
tackers can still be a threat. The main application do-
main of MDS however is the detection of attacks from
physical attackers and the discussion of MDS often
considers network attackers out-of-scope.
The attacker’s intent and the undesirable conse-
quences of the attack are both situated in the physi-
cal world. We therefore reiterate on the description
of C-ITS as cyber physical system. C-ITS satisfy
the definition of a CPS in that vehicles are networked
software systems that control a physical process. We
detail our model in the next section and refer to the
set of applications as software systems and the phys-
ical world as the physical process in the remainder
of this section. The idea of performing misbehavior
detection over intrusion detection is rooted in the re-
alization that C-ITS are CPS. An important property
of CPS is that the physical process and the networked
software are interdependent and cannot be modeled
separately. Many detection schemes indeed exploit
the properties of the physical world to identify mis-
behavior on the network layer. To build new metrics
A Practical Evaluation Method for Misbehavior Detection in the Presence of Selfish Attackers
531
without assuming H
cm
, we propose to equally con-
sider the properties of the physical world when assess-
ing the effectiveness of MDS. We identify two prob-
lems that need to be solved.
1. The physical process is complex. Finding the
right abstraction is difficult and problem-specific.
One contribution evaluated MDS by analyzing
a specific application and comparing the effects
of one selected attack (van der Heijden et al.,
2016). While this leads to very useful conclu-
sions, the comparability between mechanisms is
limited when varying the attack.
2. No evaluation framework is available. VeReMi
has provided a common dataset for non-
interactive evaluations, but cannot be used to ana-
lyze effects that MDS have on the physical world.
In the following section, we address the first is-
sue by suggesting a metric for the selfish attacker pro-
posed in (Samara et al., 2010; van der Heijden et al.,
2016). The selfish attacker is a physical attacker with
the goal of gaining an advantage on the road. By fo-
cusing on the attacker’s goal, we can provide a metric
that is applicable to all attacks and MDS. We address
the second issue with our simulation framework that
we present in Section 5.
4 A METRIC FOR SELFISH
ATTACKERS
The selfish attacker adheres to specified message for-
mats, but does not necessarily follow all protocol
specifications, e.g. he can send messages more often
than allowed or supress packets he should transmit or
forward. We deem this attacker type highly relevant,
because most traffic participants have a motivation for
this kind of attack and it seems plausible that such an
attack could be performed by individuals with little
costs, e.g. using methods of modifying the firmware
of certain vehicle models. In the scope of this work,
we do not consider cooperative attackers, i.e. we as-
sume that the attacker only controls a single vehicle
and does not cooperate with others when performing
his attacks. We assume that the attacker’s goal is to
reach his destination as fast as possible.
We previously discussed the limitations of cur-
rent metrics based on the confusion matrix and fur-
ther note there is another fundamental shortcoming
of such metrics in the context of selfish attackers:
In real-world traffic scenarios, maliciousness is not
a property of a message and cannot be captured by
a metric based on the confusion matrix of individ-
ual packet classifications. Measuring FPRs and FNRs
thus is a difficult task by itself, because there is no
inherent definition of packet maliciousness. In some
cases, the decision is obvious: if a vehicle sends a
warning for an accident that did not happen, the in-
formation is clearly incorrect and the packet should
be treated as malicious. But this is not a function of
just the packet itself: the correctness depends on the
state of the world at this point in time, i.e. whether
an accident happened or not. Apart from this, the
classification is gradual, especially when considering
sensor noise and inaccuracies. A message contain-
ing a vehicle’s location will always have some offset
due to GNSS and integration errors. There is no natu-
ral distance at which the message is clearly incorrect
and should be classified as malicious. A workaround
used in some evaluations is to define attacker vehicles
and assume all packets sent by them are malicious and
should be classified as incorrect. We note that in the
case of selfish attackers, the maliciousness of a packet
also depends on the intention behind the transmission
of the packet, not on the packet’s content, which gives
further weight to our argument that metrics should be
closely coupled with the attackers intention.
4.1 Modeling the Cyber Physical
System
In light of these observations, we propose a CPS-
based MDS-metric for selfish attackers. The set of
applications running inside all vehicles forms the soft-
ware system interacting with the physical process. We
view the traffic flow itself as the physical process and
argue that this is what needs to be protected in the
presence of selfish attackers: an attacker that is unable
to change the traffic flow will not gain an advantage
over the same scenario without the attack.
The MDS is part of the software system, i.e. the
set of components that form driving decisions inside
the individual vehicles, and therefore also influences
the traffic. This model can take most influences into
account that are part of the real traffic system, in-
cluding other components running inside the vehicle’s
software system.
4.2 Attacker’s Advantage
In order to measure the influence of the MDS on the
traffic system, we first formulate a metric for the de-
gree to which an attacker achieves his goal. We as-
sume that the attacker controls a single vehicle with
a start position p
s
and a destination p
d
. The vehi-
cle passes the driving distance s
a
between p
s
and p
d
in time t
a
. To measure the advantage the attacker
achieves with his attack, we measure the time t
a
and
VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems
532
the time t
a,fair
, which is the driving time of the same
vehicle when behaving like a legitimate network par-
ticipant, i.e. not performing any attacks. We call this
the attacker’s advantage
q
a
= 1
t
a
t
a,fair
. (1)
In our model, this is purely a property of the physical
process, i.e. an effect that the attackers’ communica-
tion has on the physical world. q
a
can admit negative
values if the attack leads to a longer travel time for the
attacker.
We now propose to base the quality assessment
of MDS on q
a
: an optimal detection mechanism
achieves q
a
0, i.e. the attacker is unable to gain
an advantage with his attack. In realistic scenarios,
this is not always possible, for example if the detec-
tion scheme is not running on all vehicles. It is thus
desirable to obtain a lower bound to which a specific
scheme can be compared in order to assess it’s qual-
ity. We later show how a lower bound for q
a
can be
estimated.
Because q
a
is a property of the physical process,
it can be measured independently of the performed
attacks or detection mechanisms. To measure q
a
, only
two conditions are necessary:
1. the physical process must be observable, i.e. t
a
can be determined, and
2. the scenario must be repeatable
Condition 2 is necessary to measure t
a,fair
under the
exact same conditions but without the attack. Since
we see the MDS as a distributed system running in-
side the software system, evaluation decisions such as
aggregation methods are not needed for determining
q
a
. The specific deployment parameters of the mech-
anism, such as the penetration rate, must however be
specified and are seen as a parameter of the mecha-
nism to be analyzed.
While t
a
can be measured universally, it does de-
pend on all components that the physical process it-
self depends on, especially
the traffic scenario, and
the implementation of vehicles’ applications in-
fluencing the driving decisions.
When comparing different MDS, these influences
should be minimized, e.g. the measurements should
be taken in the same traffic scenario and with the same
implementations. We expect that q
a
is a useful metric
to assess detection performance, even when compar-
ing mechanisms with different traffic scenarios and
implementations, as the value of q
a
specifies what an
attacker can achieve in the presence of a MDS.
5 EVALUATION FRAMEWORK
Our metric q
a
cannot be determined using precom-
puted datasets such as VeReMi, since measuring q
a
depends on observing the attack’s impact on the phys-
ical process. The simulation of interactions between
vehicles’ software and the traffic system requires that
the simulator allows coupling a road traffic simulation
and the networking and software simulation. This is
usually called an interactive simulation. This require-
ment is necessary to allow the misbehavior detection
mechanism to impact the physical process, i.e. the
traffic simulation. Our framework uses the artery sim-
ulation module (Riebl et al., 2019). We use this basis
to provide an extensible framework for the analysis of
misbehavior detection mechanisms. Our framework
can be used to analyze attacks and detection mecha-
nisms for several attacker types, but we limit our de-
scription in this work to the analysis of detection of
selfish attacks and the measurement of q
a
. We extend
artery to facilitate the simulative analysis of misbe-
havior detection mechanisms and attack scenarios.
Artery is an interactive simulation framework,
originally developed for the testing of applications in
the European ITS-G5 vehicular communications pro-
tocol stack. Artery is implemented as package for
the OMNeT++ Network Simulator (Varga, 2001) and
couples the networking simulation modules with the
SUMO road traffic simulator (Lopez et al., 2018).
The communication stack is implemented by the
vanetza project (Riebl et al., 2017), a standalone open
source implementation of the ITS-G5 standards.
5.1 Detection Model
To measure the impact of MDS on the physical pro-
cess, we assume a data-centric misbehavior detection
model, i.e. the MDS acts on individual packets. Each
received packet is passed to the misbehavior detection
algorithm and classified. Our framework, by default,
discards all packets that are classified as malicious.
Other packets are passed up in the stack and handled
by the application. Figure 1 shows the processing of
received packets inside a vehicle’s simulation mod-
ule. We use this simple model to decouple the mis-
behavior detection from the application implementa-
tion. Detection mechanisms can be implemented in-
dependently of application behavior. This approach
removes flexibility from the application but allows to
keep the implementation minimal and does not incur
unrealistic limitations in our view.
A Practical Evaluation Method for Misbehavior Detection in the Presence of Selfish Attackers
533
802.11p NIC
Simulation
Middleware
Behavioral
Application
Wrapper
MBD-Service
Misbehavior
Detection
Mechanism
statistics
collection
Figure 1: Overview over the processing of messages in a
simulated vehicle.
5.2 Simulation Scenarios
A simulation scenario s = (s
t
, B, A
mbds
, p
mbds
, A, R) in
our framework is defined by
1. a traffic scenario s
t
,
2. the set of behavioral applications B defining the
vehicles’ driving decisions,
3. a misbehavior detection mechanism A
mbds
,
4. the penetration rate 0 p
mbds
1 defining the
share of vehicles equipped with a MDS,
5. the attacker definition A and
6. the set of random seeds R = {r
1
, ..., r
n
}
In order to analyze a detection mechanism, A
mbds
and
A must be implemented in our framework. We de-
signed the interfaces with a focus on extensibility to
make analysis feasible for most data-centric mecha-
nisms and expect that most implementations can be
integrated without much effort. The parameter p
mbds
is likely to be varied as part of the analysis.
The VeReMi-dataset provides a substantial bene-
fit for the academic discussion by providing a com-
mon reference against which new detection mecha-
nisms can be evaluated. We see the need of comparing
similar detection methods directly and likewise aim
to provide a shared evaluation platform that is useful
for future research. To maximize comparability of q
a
measurements, we hope to find a common fixed set of
parameters s
t
and R that can be used across publica-
tions for many mechanisms.
5.3 Traffic Scenarios
We selected two traffic scenarios highway and urban
to facilitate an effective evaluation. The design of
traffic scenarios is not trivial, as it represents a trade-
off between execution time, universality and attacker-
Figure 2: Highway segment scenario.
Figure 3: Urban scenario.
specificity. A large-scale scenario such as the LuST-
Simulation (Codeca et al., 2015) used by VeReMi
contains numerous traffic situations and is suited for
many attacks, but is very expensive in terms of run
time costs. For this reason, we built two separate traf-
fic scenarios, a simple and artificial highway segment
and a medium-scale urban road network in the city of
Karlsruhe, Germany. Figures 2 and 3 show the road
network of both scenarios.
The highway scenario simple and contains few
junctions. Traffic flows from left to right and all ve-
hicles try to reach the last segment of the main road.
The main road is not speed-limited, other road seg-
ments are limited to 13, 89ms
1
. Without external
interference, no vehicle chooses to leave the main
road. The traffic is composed of mostly passenger
cars and busses with a maximum speed of 20ms
1
but also includes 2.6% sports cars with a maximum
speed of 20ms
1
. The attackers maximum speed is
set to 100ms
1
to allow him to create an advantage
out of a beneficial traffic situation. The total amount
of vehicles that are driving simultaneously varies be-
tween 112 and 117.
The urban scenario is more complex and mod-
els a realistic urban traffic system with traffic lights
and many junctions. Vehicles start at a random start-
ing position and follow a randomly selected route. In
this traffic scenario, speed is limited and the traffic is
dense. Without performing an attack, the route takes
38 minutes on average. Maximum speeds of vehicles
do not influence the driving time.
VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems
534
5.4 Behavioral Applications
The physical process measured by our approach is
highly dependent on the driving decisions made by
each individual traffic participant. Many attacks aim
to influence the driving decisions of other participants
by sending misleading messages.
For our analysis, we keep the assumptions about
the implementation minimal. We assume all non-
attacker participants act rationally and always take the
route that appears to be the fastest towards their des-
tination. Once the application receives new informa-
tion about the world, it recalculates the path with the
shortest travel time to the destination and changes the
route if the new path is shorter than the current route.
5.5 Random Seeds
Individual simulation runs in our framework are de-
terministic and reproducible, if the implementations
of A
mbds
and A support it. All randomized ele-
ments in the scenario, such as the selection of vehi-
cles equipped with a misbehavior detection mecha-
nism according to p
mbds
, depend only on a single seed
value. In order to obtain a stable value for q
a
, multiple
simulation runs are executed with different seed val-
ues. In this case, t
i
a
and t
i
a,fair
are measured for each
simulation run i and q
a
can be calculated as
q
a
= 1
1
N
N1
i=0
t
i
a
t
i
a,fair
, (2)
where N is the total number of simulation runs. For
large N, the influence of the choice of R = {r
i
}
0i<N
diminishes. Individual traffic situations can however
change substantially with r
i
and therefore q
a
can be
different for deviating seed values.
6 EVALUATION OF EXISTING
SCHEMES
We analyze a basic variant of the detection mecha-
nism proposed by Petit, Feiri and Kargl (Petit et al.,
2011) using our evaluation metric for different pene-
tration rates. The implemented scheme tries to create
a consensus about dangerous events reported by other
vehicles. As long as the vehicle is not forced to make
a decision, the mechanism waits for further messages
sent by other neighbors about this event. Events are
only classified as legitimate if the number of warnings
sent by different senders exceeds a dynamic thresh-
old.
0 0.2 0.4
0.6
0.8 1
1
0.5
0
0.5
1
p
mbds
q
a
Perfect Detection, 90% confidence intervals
urban traffic scenario
highway traffic scenario
Figure 4: Estimation of a lower bound for q
a
for both traffic
scenario.
6.1 Attacker Implementation
We implement a simple fake message injection attack
A
mi
. The attacker’s vehicle sends warnings for nonex-
istent accidents on the attacker’s route every four sec-
onds to convince other vehicles of avoiding the route
and reduce traffic on his path. The attacker’s vehicle
follows a fixed route defined by the scenario. While
this is a very simple strategy, our results show that it
is very effective if no misbehavior detection is per-
formed.
6.2 Simulation Parameters
The attack and detection mechanism are usable in the
urban traffic scenario as well as in the highway traf-
fic scenario. For our evaluation, we use both traf-
fic scenarios. We use the implementation of behav-
ioral applications as described in Section 5.4. We de-
fine a total of twelve simulation scenarios by using
six different values for p
mbds
in both traffic scenarios.
We use R = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} in all scenarios,
which results in a total of 100 simulation runs.
6.3 A Lower Bound for q
a
Our framework can be used to estimate the limits of
what misbehavior detection can achieve. We imple-
mented a detection mechanism A
perfect
that has ac-
cess to the internal attacker state and always classifies
packets correctly.
Figure 4 shows the resulting q
a
for A
perfect
with
different penetration rates. We see that this attack is
very successful in the urban traffic scenario, as the at-
tacker is able to achieve a high q
a
value only if no
detection mechanism is used. We conclude that even
low penetration rates of effective misbehavior detec-
tion systems can reduce the selfish attacker’s advan-
tage substantially. For p
mbds
0.5, the attacker gains
no noticeable advantage by performing the attack. In
some simulation runs, we measure a q
a
< 0, which
A Practical Evaluation Method for Misbehavior Detection in the Presence of Selfish Attackers
535
0 0.2 0.4
0.6
0.8 1
1
0.5
0
0.5
1
p
mbds
q
a
Petit, Feiri and Kargl, 90% confidence intervals
urban traffic scenario
highway traffic scenario
Figure 5: Evaluation result for the mechanism of Petit, Feiri
and Kargl in both scenarios.
means the attack created a disadvantage for the at-
tacker. This is not caused by the detection of the at-
tack, but instead by the simplicity of the attack. The
false warnings sent by the attacker lead to an increase
in travel time in some constellations.
In the highway traffic scenario, the attack is less
effective for lower penetration rates, but the attacker
still achieves some advantage with higher penetration
rates of misbehavior detection.
6.4 Results
We evaluated the scenarios with the mechanism from
Petit, Feiri and Kargl and show the results below. Fig-
ure 5 shows the values of q
a
for both traffic scenarios
with different penetration rates. The measurements
show that the mechanism can effectively reduce the
attacker’s advantage. In the highway traffic scenario,
we note that the mechanism’s q
a
values are very close
to the perfect detection mechanism.
The execution times of individual simulation runs
depend on the traffic scenario and the penetration rate
p
mbds
. We executed the simulations on 8 cores with a
2,6 GHz clock rate and 16 GB of RAM. Eight simula-
tion runs were executed in parallel. Simulation runs of
the highway scenario were finished in under 6 minutes
each, while the complex urban traffic scenario took
between 1.5 hours and over ten hours to complete.
Table 1 lists the execution times for some scenarios.
The execution times were measured using the perfect
detection mechanism. We suspect that the variations
in the urban scenario are caused by the reduced num-
ber of invocations of the traffic simulation for larger
p
mbds
. Our measurements surprisingly show that even
removing a small percentage of vehicles from the at-
tacker’s influence can have a substantial impact on his
advantage in some situations.
Table 1: Results and Performance.
s
t
= highway
p
mbds
q
a
Execution Time (s)
0.00 0.75 356
0.25 0.16 354
0.80 0.07 361
1.00 0.00 351
s
t
= urban
p
mbds
q
a
Execution Time (s)
0.00 0.22 37 232
0.25 0.18 22 220
0.80 0.13 9877
1.00 0.11 5501
7 CONCLUSION
In this work, we propose a new metric for assessing
the quality of misbehavior detection mechanisms in
the presence of selfish attackers. To improve over the
state-of-the-art metrics based on the confusion matrix
of packet classifications, we base our metric on mea-
suring the attacker’s success on achieving his goal.
We further present a simulation framework to
measure q
a
and propose two scenarios for testing at-
tacks and detection mechanisms. We analyze a sim-
plified variant of the detection mechanism proposed
by Petit, Feiri and Kargl in this framework and esti-
mate a lower bound for q
a
by using a perfect detec-
tion mechanism. We discuss run-time performance
and conclude that our metric can be used to evaluate
practical detection mechanisms in realistic traffic sce-
narios.
Our metric q
a
allows a reliable assessment of
MDS and gives a realistic indication of the effec-
tiveness of an MDS, which is an improvement over
metrics based on the confusion matrix, such as the
precision-recall-graph. Our proposal is able to assess
MDS in the presence of selfish attackers. The first
application of our framework showed promising re-
sults and we hope to extend this analysis to other de-
tection schemes and more sophisticated attacks. We
show that metrics based on measuring the physical
process can be used to successfully evaluate MDS.
We observed that the interpretation of q
a
could benefit
from additional traffic measurements to better assess
negative effects that the misbehavior detection has on
desired effects.
VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems
536
ACKNOWLEDGEMENTS
This work was supported by the Competence Center
for Applied Security Technology (KASTEL).
REFERENCES
Codeca, L., Frank, R., and Engel, T. (2015). Luxembourg
sumo traffic (lust) scenario: 24 hours of mobility for
vehicular networking research. In 2015 IEEE Vehicu-
lar Networking Conference (VNC), pages 1–8.
Kamel, J., Wolf, M., van der Hei, R. W., Kaiser, A., Urien,
P., and Kargl, F. (2020). Veremi extension: A dataset
for comparable evaluation of misbehavior detection in
vanets. In IEEE International Conference on Commu-
nications (ICC), pages 1–6.
Lopez, P. A., Behrisch, M., Bieker-Walz, L., Erdmann, J.,
Fl
¨
otter
¨
od, Y.-P., Hilbrich, R., L
¨
ucken, L., Rummel, J.,
Wagner, P., and Wießner, E. (2018). Microscopic traf-
fic simulation using sumo. In The 21st IEEE Interna-
tional Conference on Intelligent Transportation Sys-
tems, pages 2575–2582. IEEE.
Petit, J., Feiri, M., and Kargl, F. (2011). Spoofed data de-
tection in vanets using dynamic thresholds.
Riebl, R., Obermaier, C., and G
¨
unther, H.-J. (2019). Artery:
Large scale simulation environment for its applica-
tions. In Recent Advances in Network Simulation,
pages 365–406. Springer.
Riebl, R., Obermaier, C., Neumeier, S., and Facchi,
C. (2017). Vanetza: Boosting research on inter-
vehicle communication. In Proceedings of the 5th
GI/ITG KuVS Fachgespr
¨
ach Inter-Vehicle Communi-
cation (FG-IVC 2017), pages 37–40.
Samara, G., Al-Salihy, W. A., and Sures, R. (2010). Se-
curity issues and challenges of vehicular ad hoc net-
works (vanet). In 4th International Conference on
New Trends in Information Science and Service Sci-
ence, pages 393–398. IEEE.
Sommer, C., German, R., and Dressler, F. (2011). Bidirec-
tionally Coupled Network and Road Traffic Simula-
tion for Improved IVC Analysis. In IEEE Transac-
tions on Mobile Computing (TMC), volume 10, pages
3–15. IEEE.
van der Heijden, R. W. (2018). Misbehavior Detection in
Cooperative Intelligent Transport Systems. PhD the-
sis, Ulm, Germany.
van der Heijden, R. W., Dietzel, S., Leinm
¨
uller, T., and
Kargl, F. (2016). Survey on misbehavior detection in
cooperative intelligent transportation systems.
van der Heijden, R. W., Dietzel, S., Leinm
¨
uller, T., and
Kargl, F. (2019). Survey on misbehavior detec-
tion in cooperative intelligent transportation systems.
In IEEE Communications Surveys & Tutorials, vol-
ume 21, pages 779–811.
van der Heijden, R. W. and Kargl, F. (2017). Evaluating
misbehavior detection for vehicular networks. In 5th
GI/ITG KuVS Fachgespr
¨
ach Inter-Vehicle Communi-
cation, page 5.
van der Heijden, R. W., Lukaseder, T., and Kargl, F. (2018).
Veremi: A dataset for comparable evaluation of mis-
behavior detection in vanets. In Beyah, R., Chang,
B., Li, Y., and Zhu, S., editors, Security and Privacy
in Communication Networks, pages 318–337, Cham.
Springer International Publishing.
Varga, A. (2001). Discrete event simulation system. In Pro-
ceedings of the European Simulation Multiconference
(ESM’2001), pages 1–7.
A Practical Evaluation Method for Misbehavior Detection in the Presence of Selfish Attackers
537