A Practical Evaluation Method for Misbehavior Detection

in the Presence of Selﬁsh Attackers

Marek Wehmer

and Ingmar Baumgart

FZI Research Center for Information Technology, Karlsruhe, Germany

Keywords:

Evaluation Metric, Misbehavior Detection, VANET, Vehicular Network, Cyber Physical Systems Simulation.

Abstract:

With recent deployment activities of Vehicle-to-X systems, the need for practical misbehavior detection is

growing. The academic discussion on related topics has progressed in the last years and delivered new evalua-

tion approaches. Most research however concentrates on evaluations based on the confusion matrix of packet

classiﬁcations, such as the precision-recall graph. We show that his approach has fundamental limitations and

does not allow to derive valid statements about the real-world impact of a scheme. After reviewing the state of

the art, we show that the physical manifestation of attacks must be considered when evaluating misbehaviour

detection systems. We propose a shift of perspective towards an attacker-oriented evaluation and contribute a

new metric for selﬁsh attackers based on the physical impact of the attack. We further present a simulation

framework to practically evaluate misbehavior detection systems.

1 INTRODUCTION

The roll-out of Vehicle-to-X communications is gath-

ering momentum. As a means of greatly reducing

trafﬁc accident numbers and efﬁciency, expectations

are high. Inter-vehicular communication will impact

safety relevant control systems and must therefore be

considered a critical resource in vehicles. Security

considerations play an important role in the design

and implementation of such systems: Attacks and

mitigation strategies have been a research topic for

the last decade. The decentralized nature and latency

requirements imply particular challenges for the de-

sign of security mechanisms. Messages conform to a

small set of fully speciﬁed formats and convey a claim

about the physical world. Moreover, all messages are

signed by authorized participants, but not encrypted.

The signature implies a certain degree of trust in the

received message, but cannot be fully relied upon: As

in any public key infrastructure, secret keys can leak

and be abused by malicious actors to send mislead-

ing information. But even bugs and faulty sensors

can lead to wrong claims being signed and sent out

by legitimate participants of the network. In this set-

ting, misbehavior detection fulﬁlls a critical task: in-

correct information must be ﬁltered out, while cor-

rect information must be trusted to prevent accidents

and increase trafﬁc efﬁciency. One building block of

https://orcid.org/0000-0001-5155-8934

maintaining safety properties is to ﬁlter out packets

with untruthful information. Misbehavior detection

systems are designed to recognize such attacks and

mitigate their impact.

Evaluation of misbehavior detection schemes has

been a tedious task and proposed schemes have usu-

ally been assessed with speciﬁc simulation scenarios

or analytically, because real-world evaluation is dif-

ﬁcult and infeasible in many cases. Many contribu-

tions use the false-positive rate (FPR) and the false-

negative rate (FNR) of classiﬁcations of received

packets to evaluate the quality of schemes. Recent

advances in evaluation methodology have improved

the comparability between schemes by using a com-

mon reference dataset and also set the path towards

better metrics. Those metrics are however still based

on the confusion matrix, and therefore refer to the bi-

nary classiﬁcation performance of all received pack-

ets during a simulation run. This approach has fun-

damental limitations in some cases. For example, the

relevance of misclassiﬁcations is not necessarily dis-

tributed equally among participants. For some vehi-

cles, a speciﬁc packet sent by the attacker might just

not have any effect, while it can be critical for the

safety of another. Some attackers may also try to send

a great amount of packets to convince a speciﬁc vehi-

cle of one misinformation. For their success, it does

not necessarily matter if their target accepts all the

packets or just one.

Wehmer, M. and Baumgart, I.

A Practical Evaluation Method for Misbehavior Detection in the Presence of Selﬁsh Attackers.

DOI: 10.5220/0010451005290537

In Proceedings of the 7th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2021), pages 529-537

ISBN: 978-989-758-513-5

529

In order to understand why it is impossible to

comprehensively assess misbehavior detection with

the confusion matrix, a new perspective on misbehav-

ior is needed. Attackers sending misleading or incor-

rect information about the state of the world are not

trying to get as many packets accepted as possible—

their goal is to alter the state of the physical world in

their favor. The success of their attack and in conse-

quence the ability of any misbehavior detection sys-

tem (MDS) to mitigate this attack can thus only be

measured by considering the effect of the communi-

cation on the physical world. This aspect has rarely

been considered and requires new tools to interac-

tively simulate cooperative intelligent transport sys-

tems (C-ITS) in the presence of attackers.

2 RELATED WORK

Attacks and misbehavior detection in vehicular net-

works have been studied extensively in the last

years (van der Heijden et al., 2019; van der Heijden,

2018). Many approaches of detecting attacks have

been published, but most focus on detecting very spe-

ciﬁc attacks. As such, results are difﬁcult to com-

pare and usually authors do not share a common eval-

uation approach or even metrics that allow for com-

parison. Moreover, the implementation is often not

available, which makes reproducing results difﬁcult.

Recently, the VeReMi dataset containing communi-

cation traces of different attacks being conducted in

the LuST (Codeca et al., 2015) trafﬁc scenario has

been published and extended to improve evaluation

quality in the ﬁeld and provide a basis for compari-

son (van der Heijden et al., 2018; Kamel et al., 2020).

The simulation code is based on the Veins vehicular

network simulator (Sommer et al., 2011) and publicly

available.

The choice of metrics for evaluating the quality

of mechanisms is a related topic that is being dis-

cussed in the community. The authors of VeReMi

propose to use the precision / recall graph of all re-

ception events (van der Heijden et al., 2018) instead

of the FPR / FNR metrics that have been used by

many works before. While such metrics are easy

to obtain in simulations, the numbers are difﬁcult to

interpret and even though they seem applicable for

most data-centric approaches and suggest comparison

to some extent, not all mechanisms can be measured

accurately. The evaluation depends on further sub-

tleties, such as the aggregation method and a deﬁni-

tion of when a message is to be considered as mali-

cious (van der Heijden and Kargl, 2017), which fur-

ther weakens comparability between mechanisms. To

take the dispersion of errors in detection performance

between participants into account, the gini-index of

the FPRs / FNRs of different vehicles has been pro-

posed (van der Heijden et al., 2018). While differ-

ences in classiﬁcation performance between vehicles

can be described with the gini-index, the underlying

question of the individual importance of packets and

vehicles remains open. The similarity between C-ITS

and cyber physical systems (CPS) has been noticed

and discussed before in the context of misbehavior

detection (van der Heijden et al., 2016). However,

the discussion did not consider the physical part and

mostly separated both domains. Application behavior

metrics have been used to analyze the physical im-

pact of attacks for speciﬁc applications like cooper-

ative adaptive cruise control (CACC) (van der Heij-

den, 2018), but have not been explored further due to

concerns about dependencies on speciﬁc implementa-

tions (van der Heijden and Kargl, 2017). This concept

is related to our approach, and we generalize its appli-

cation and systematically show why these metrics are

crucial for the assessment of MDS.

3 A NEW SECURITY MODEL

FOR C-ITS

In the literature, several attacker models are used and

attackers are usually described by their intention and

their capabilities. The intentions and capabilities of

possible attackers differ greatly, but a single attacker

can perform multiple attacks according to his capa-

bilities. While the individual misbehavior detection

mechanisms mostly focus on detecting speciﬁc at-

tacks, the discussion about evaluation metrics for mis-

behavior detection tries to ﬁnd a generic set of metrics

that are independent of the mechanism but also the

attacker model. One property common to most (but

not all) attackers is that they send packets, and this is

what recent evaluation metrics are based on: the clas-

siﬁcation of incoming packets into legitimate and ma-

licious. Metrics based on the confusion matrix of the

classiﬁcation have to assume the quality of a detection

mechanism is related to the number of correct classi-

ﬁcations in some form and that mechanisms with a

better classiﬁcation are better at detecting attacks.

We argue that this approach is too broad and while

such metrics are applicable to most relevant attacks

and detection mechanisms, the underlying assump-

tion does not hold in many cases and therefore the

effectiveness of mechanisms is not sufﬁciently re-

ﬂected. This is substantiated when thinking about

the success of attacks: an attack is successful if the

attacker’s goal is achieved. The hypothesis that the

VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

530

degree of which the attacker’s goal is achieved is

strongly correlated to a function of the classiﬁcation

of received packets by other participants is at least not

trivially justiﬁed. We call this hypothesis H

. On

the contrary, we already provided some indications of

why H

does not hold in practice in Section 1. In the

remainder of this section, we present a new model to

discern attackers for which H

holds and attackers

for which it does not.

3.1 Limitations of the Confusion Matrix

Confusion-matrix based metrics such as the FPR and

FNR have been used to evaluate network intrusion de-

tection systems (IDS). Traditional computer networks

and C-ITS share a number of attack vectors, i.e. at-

tackers might try to gain code execution privileges or

crash systems by sending packets that exploit bugs in

the parsing code. They might also try to exﬁltrate in-

formation or deny operation to the network itself by

attacking ﬂaws in application logic or network infras-

tructure. We refer to these attackers as network at-

tackers. Because traditional computer networks in-

volve many different applications running on general-

purpose computers communicating using many dif-

ferent protocols, assuming speciﬁc intentions or goals

of attackers is unhelpful and therefore not included in

the attacker model. In classic networks based on the

internet protocol stack, a one-to-one-communication

is possible, meaning the attacker only communicates

directly with his targets. In this setting, the relation

between a received network packet (or a series of re-

ceived network packets) and an attack is sufﬁciently

close and IDS performing better in the confusion ma-

trix are therefore likely to be better at detecting at-

tacks. For network attackers, we can assume H

In C-ITS, network communication has a well-

deﬁned function. Network packets are broadcast over

geographic areas and use a very small set of publicly

speciﬁed protocols to distribute information about the

physical world. In contrast to classical networks, the

meaning of each packet is known. The implementa-

tion of involved applications may be proprietary, but

their purpose is well-deﬁned and as such assumptions

about the effect of received packets can be made. In

this setting, we face a different type of attacker that

we call a physical attacker. The physical attacker is

characterized by his intention of altering the state of

the world in some aspect. To reach his goal, he tries

to convince other vehicles to behave in a certain way

by sending packets that conform to the speciﬁcation,

but may contain false information. We argue that H

does not hold for the physical attacker for two rea-

sons.

1. The broadcast of packets in C-ITS implies that

irrelevant participants will receive the packet.

These participants do not contribute to the realiza-

tion of the attacker’s intention and may not even

act on the new information they receive, because

it only affects the intended target of the attacker.

Even though classiﬁcation performance for those

vehicles is irrelevant for both the recipient and the

attacker, it contributes to the confusion matrix.

2. When the same packet with false information is

received by a target of the attacker, the result-

ing impact can be different. Suppose an attacker

tries to provoke an accident by faking an end-of-

queue warning. Two vehicles erroneously clas-

sify the warning as legitimate, where one of them

is near the alleged end-of-queue situation and the

other is not. The vehicle nearby the fake end-of-

queue will be more likely to trigger a dangerous

emergency break maneuver than the vehicle fur-

ther away. Both packet reception events have the

same inﬂuence on the confusion matrix and both

belong to the same attack.

There is no obvious relation between the success of

an attack and the number packets received by other

participants. When considering physical attackers, we

require a new set of metrics that do not depend on H

and instead reﬂect the ability of a MDS to mitigate an

attack.

3.2 Embracing the Physical World

Metrics based on the confusion-matrix still have a

valid application in C-ITS, since classical network at-

tackers can still be a threat. The main application do-

main of MDS however is the detection of attacks from

physical attackers and the discussion of MDS often

considers network attackers out-of-scope.

The attacker’s intent and the undesirable conse-

quences of the attack are both situated in the physi-

cal world. We therefore reiterate on the description

of C-ITS as cyber physical system. C-ITS satisfy

the deﬁnition of a CPS in that vehicles are networked

software systems that control a physical process. We

detail our model in the next section and refer to the

set of applications as software systems and the phys-

ical world as the physical process in the remainder

of this section. The idea of performing misbehavior

detection over intrusion detection is rooted in the re-

alization that C-ITS are CPS. An important property

of CPS is that the physical process and the networked

software are interdependent and cannot be modeled

separately. Many detection schemes indeed exploit

the properties of the physical world to identify mis-

behavior on the network layer. To build new metrics

A Practical Evaluation Method for Misbehavior Detection in the Presence of Selﬁsh Attackers

531

without assuming H

, we propose to equally con-

sider the properties of the physical world when assess-

ing the effectiveness of MDS. We identify two prob-

lems that need to be solved.

1. The physical process is complex. Finding the

right abstraction is difﬁcult and problem-speciﬁc.

One contribution evaluated MDS by analyzing

a speciﬁc application and comparing the effects

of one selected attack (van der Heijden et al.,

2016). While this leads to very useful conclu-

sions, the comparability between mechanisms is

limited when varying the attack.

2. No evaluation framework is available. VeReMi

has provided a common dataset for non-

interactive evaluations, but cannot be used to ana-

lyze effects that MDS have on the physical world.

In the following section, we address the ﬁrst is-

sue by suggesting a metric for the selﬁsh attacker pro-

posed in (Samara et al., 2010; van der Heijden et al.,

2016). The selﬁsh attacker is a physical attacker with

the goal of gaining an advantage on the road. By fo-

cusing on the attacker’s goal, we can provide a metric

that is applicable to all attacks and MDS. We address

the second issue with our simulation framework that

we present in Section 5.

4 A METRIC FOR SELFISH

ATTACKERS

The selﬁsh attacker adheres to speciﬁed message for-

mats, but does not necessarily follow all protocol

speciﬁcations, e.g. he can send messages more often

than allowed or supress packets he should transmit or

forward. We deem this attacker type highly relevant,

because most trafﬁc participants have a motivation for

this kind of attack and it seems plausible that such an

attack could be performed by individuals with little

costs, e.g. using methods of modifying the ﬁrmware

of certain vehicle models. In the scope of this work,

we do not consider cooperative attackers, i.e. we as-

sume that the attacker only controls a single vehicle

and does not cooperate with others when performing

his attacks. We assume that the attacker’s goal is to

reach his destination as fast as possible.

We previously discussed the limitations of cur-

rent metrics based on the confusion matrix and fur-

ther note there is another fundamental shortcoming

of such metrics in the context of selﬁsh attackers:

In real-world trafﬁc scenarios, maliciousness is not

a property of a message and cannot be captured by

a metric based on the confusion matrix of individ-

ual packet classiﬁcations. Measuring FPRs and FNRs

thus is a difﬁcult task by itself, because there is no

inherent deﬁnition of packet maliciousness. In some

cases, the decision is obvious: if a vehicle sends a

warning for an accident that did not happen, the in-

formation is clearly incorrect and the packet should

be treated as malicious. But this is not a function of

just the packet itself: the correctness depends on the

state of the world at this point in time, i.e. whether

an accident happened or not. Apart from this, the

classiﬁcation is gradual, especially when considering

sensor noise and inaccuracies. A message contain-

ing a vehicle’s location will always have some offset

due to GNSS and integration errors. There is no natu-

ral distance at which the message is clearly incorrect

and should be classiﬁed as malicious. A workaround

used in some evaluations is to deﬁne attacker vehicles

and assume all packets sent by them are malicious and

should be classiﬁed as incorrect. We note that in the

case of selﬁsh attackers, the maliciousness of a packet

also depends on the intention behind the transmission

of the packet, not on the packet’s content, which gives

further weight to our argument that metrics should be

closely coupled with the attackers intention.

4.1 Modeling the Cyber Physical

System

In light of these observations, we propose a CPS-

based MDS-metric for selﬁsh attackers. The set of

applications running inside all vehicles forms the soft-

ware system interacting with the physical process. We

view the trafﬁc ﬂow itself as the physical process and

argue that this is what needs to be protected in the

presence of selﬁsh attackers: an attacker that is unable

to change the trafﬁc ﬂow will not gain an advantage

over the same scenario without the attack.

The MDS is part of the software system, i.e. the

set of components that form driving decisions inside

the individual vehicles, and therefore also inﬂuences

the trafﬁc. This model can take most inﬂuences into

account that are part of the real trafﬁc system, in-

cluding other components running inside the vehicle’s

software system.

4.2 Attacker’s Advantage

In order to measure the inﬂuence of the MDS on the

trafﬁc system, we ﬁrst formulate a metric for the de-

gree to which an attacker achieves his goal. We as-

sume that the attacker controls a single vehicle with

a start position p

and a destination p

. The vehi-

cle passes the driving distance s

between p

and p

in time t

. To measure the advantage the attacker

achieves with his attack, we measure the time t

and

VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

532

the time t

a,fair

, which is the driving time of the same

vehicle when behaving like a legitimate network par-

ticipant, i.e. not performing any attacks. We call this

the attacker’s advantage

= 1 −

a,fair

. (1)

In our model, this is purely a property of the physical

process, i.e. an effect that the attackers’ communica-

tion has on the physical world. q

can admit negative

values if the attack leads to a longer travel time for the

attacker.

We now propose to base the quality assessment

of MDS on q

: an optimal detection mechanism

achieves q

≤ 0, i.e. the attacker is unable to gain

an advantage with his attack. In realistic scenarios,

this is not always possible, for example if the detec-

tion scheme is not running on all vehicles. It is thus

desirable to obtain a lower bound to which a speciﬁc

scheme can be compared in order to assess it’s qual-

ity. We later show how a lower bound for q

can be

estimated.

Because q

is a property of the physical process,

it can be measured independently of the performed

attacks or detection mechanisms. To measure q

, only

two conditions are necessary:

1. the physical process must be observable, i.e. t

can be determined, and

2. the scenario must be repeatable

Condition 2 is necessary to measure t

a,fair

under the

exact same conditions but without the attack. Since

we see the MDS as a distributed system running in-

side the software system, evaluation decisions such as

aggregation methods are not needed for determining

. The speciﬁc deployment parameters of the mech-

anism, such as the penetration rate, must however be

speciﬁed and are seen as a parameter of the mecha-

nism to be analyzed.

While t

can be measured universally, it does de-

pend on all components that the physical process it-

self depends on, especially

• the trafﬁc scenario, and

• the implementation of vehicles’ applications in-

ﬂuencing the driving decisions.

When comparing different MDS, these inﬂuences

should be minimized, e.g. the measurements should

be taken in the same trafﬁc scenario and with the same

implementations. We expect that q

is a useful metric

to assess detection performance, even when compar-

ing mechanisms with different trafﬁc scenarios and

implementations, as the value of q

speciﬁes what an

attacker can achieve in the presence of a MDS.

5 EVALUATION FRAMEWORK

Our metric q

cannot be determined using precom-

puted datasets such as VeReMi, since measuring q

depends on observing the attack’s impact on the phys-

ical process. The simulation of interactions between

vehicles’ software and the trafﬁc system requires that

the simulator allows coupling a road trafﬁc simulation

and the networking and software simulation. This is

usually called an interactive simulation. This require-

ment is necessary to allow the misbehavior detection

mechanism to impact the physical process, i.e. the

trafﬁc simulation. Our framework uses the artery sim-

ulation module (Riebl et al., 2019). We use this basis

to provide an extensible framework for the analysis of

misbehavior detection mechanisms. Our framework

can be used to analyze attacks and detection mecha-

nisms for several attacker types, but we limit our de-

scription in this work to the analysis of detection of

selﬁsh attacks and the measurement of q

. We extend

artery to facilitate the simulative analysis of misbe-

havior detection mechanisms and attack scenarios.

Artery is an interactive simulation framework,

originally developed for the testing of applications in

the European ITS-G5 vehicular communications pro-

tocol stack. Artery is implemented as package for

the OMNeT++ Network Simulator (Varga, 2001) and

couples the networking simulation modules with the

SUMO road trafﬁc simulator (Lopez et al., 2018).

The communication stack is implemented by the

vanetza project (Riebl et al., 2017), a standalone open

source implementation of the ITS-G5 standards.

5.1 Detection Model

To measure the impact of MDS on the physical pro-

cess, we assume a data-centric misbehavior detection

model, i.e. the MDS acts on individual packets. Each

received packet is passed to the misbehavior detection

algorithm and classiﬁed. Our framework, by default,

discards all packets that are classiﬁed as malicious.

Other packets are passed up in the stack and handled

by the application. Figure 1 shows the processing of

received packets inside a vehicle’s simulation mod-

ule. We use this simple model to decouple the mis-

behavior detection from the application implementa-

tion. Detection mechanisms can be implemented in-

dependently of application behavior. This approach

removes ﬂexibility from the application but allows to

keep the implementation minimal and does not incur

unrealistic limitations in our view.

A Practical Evaluation Method for Misbehavior Detection in the Presence of Selﬁsh Attackers

533

802.11p NIC

Simulation

Middleware

Behavioral

Application

Wrapper

MBD-Service

Misbehavior

Detection

Mechanism

statistics

collection

Figure 1: Overview over the processing of messages in a

simulated vehicle.

5.2 Simulation Scenarios

A simulation scenario s = (s

, B, A

mbds

, p

mbds

, A, R) in

our framework is deﬁned by

1. a trafﬁc scenario s

2. the set of behavioral applications B deﬁning the

vehicles’ driving decisions,

3. a misbehavior detection mechanism A

mbds

4. the penetration rate 0 ≤ p

mbds

≤ 1 deﬁning the

share of vehicles equipped with a MDS,

5. the attacker deﬁnition A and

6. the set of random seeds R = {r

, ..., r

}

In order to analyze a detection mechanism, A

mbds

and

A must be implemented in our framework. We de-

signed the interfaces with a focus on extensibility to

make analysis feasible for most data-centric mecha-

nisms and expect that most implementations can be

integrated without much effort. The parameter p

mbds

is likely to be varied as part of the analysis.

The VeReMi-dataset provides a substantial bene-

ﬁt for the academic discussion by providing a com-

mon reference against which new detection mecha-

nisms can be evaluated. We see the need of comparing

similar detection methods directly and likewise aim

to provide a shared evaluation platform that is useful

for future research. To maximize comparability of q

measurements, we hope to ﬁnd a common ﬁxed set of

parameters s

and R that can be used across publica-

tions for many mechanisms.

5.3 Trafﬁc Scenarios

We selected two trafﬁc scenarios highway and urban

to facilitate an effective evaluation. The design of

trafﬁc scenarios is not trivial, as it represents a trade-

off between execution time, universality and attacker-

Figure 2: Highway segment scenario.

Figure 3: Urban scenario.

speciﬁcity. A large-scale scenario such as the LuST-

Simulation (Codeca et al., 2015) used by VeReMi

contains numerous trafﬁc situations and is suited for

many attacks, but is very expensive in terms of run

time costs. For this reason, we built two separate traf-

ﬁc scenarios, a simple and artiﬁcial highway segment

and a medium-scale urban road network in the city of

Karlsruhe, Germany. Figures 2 and 3 show the road

network of both scenarios.

The highway scenario simple and contains few

junctions. Trafﬁc ﬂows from left to right and all ve-

hicles try to reach the last segment of the main road.

The main road is not speed-limited, other road seg-

ments are limited to 13, 89ms

−1

. Without external

interference, no vehicle chooses to leave the main

road. The trafﬁc is composed of mostly passenger

cars and busses with a maximum speed of 20ms

−1

but also includes 2.6% sports cars with a maximum

speed of 20ms

−1

. The attackers maximum speed is

set to 100ms

−1

to allow him to create an advantage

out of a beneﬁcial trafﬁc situation. The total amount

of vehicles that are driving simultaneously varies be-

tween 112 and 117.

The urban scenario is more complex and mod-

els a realistic urban trafﬁc system with trafﬁc lights

and many junctions. Vehicles start at a random start-

ing position and follow a randomly selected route. In

this trafﬁc scenario, speed is limited and the trafﬁc is

dense. Without performing an attack, the route takes

38 minutes on average. Maximum speeds of vehicles

do not inﬂuence the driving time.

VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

534

5.4 Behavioral Applications

The physical process measured by our approach is

highly dependent on the driving decisions made by

each individual trafﬁc participant. Many attacks aim

to inﬂuence the driving decisions of other participants

by sending misleading messages.

For our analysis, we keep the assumptions about

the implementation minimal. We assume all non-

attacker participants act rationally and always take the

route that appears to be the fastest towards their des-

tination. Once the application receives new informa-

tion about the world, it recalculates the path with the

shortest travel time to the destination and changes the

route if the new path is shorter than the current route.

5.5 Random Seeds

Individual simulation runs in our framework are de-

terministic and reproducible, if the implementations

of A

mbds

and A support it. All randomized ele-

ments in the scenario, such as the selection of vehi-

cles equipped with a misbehavior detection mecha-

nism according to p

mbds

, depend only on a single seed

value. In order to obtain a stable value for q

, multiple

simulation runs are executed with different seed val-

ues. In this case, t

and t

a,fair

are measured for each

simulation run i and q

can be calculated as

= 1 −

N−1

∑

i=0

a,fair

, (2)

where N is the total number of simulation runs. For

large N, the inﬂuence of the choice of R = {r

}

0≤i<N

diminishes. Individual trafﬁc situations can however

change substantially with r

and therefore q

can be

different for deviating seed values.

6 EVALUATION OF EXISTING

SCHEMES

We analyze a basic variant of the detection mecha-

nism proposed by Petit, Feiri and Kargl (Petit et al.,

2011) using our evaluation metric for different pene-

tration rates. The implemented scheme tries to create

a consensus about dangerous events reported by other

vehicles. As long as the vehicle is not forced to make

a decision, the mechanism waits for further messages

sent by other neighbors about this event. Events are

only classiﬁed as legitimate if the number of warnings

sent by different senders exceeds a dynamic thresh-

old.

0 0.2 0.4

0.6

0.8 1

−1

−0.5

0.5

mbds

Perfect Detection, 90% conﬁdence intervals

urban trafﬁc scenario

highway trafﬁc scenario

Figure 4: Estimation of a lower bound for q

for both trafﬁc

scenario.

6.1 Attacker Implementation

We implement a simple fake message injection attack

. The attacker’s vehicle sends warnings for nonex-

istent accidents on the attacker’s route every four sec-

onds to convince other vehicles of avoiding the route

and reduce trafﬁc on his path. The attacker’s vehicle

follows a ﬁxed route deﬁned by the scenario. While

this is a very simple strategy, our results show that it

is very effective if no misbehavior detection is per-

formed.

6.2 Simulation Parameters

The attack and detection mechanism are usable in the

urban trafﬁc scenario as well as in the highway traf-

ﬁc scenario. For our evaluation, we use both traf-

ﬁc scenarios. We use the implementation of behav-

ioral applications as described in Section 5.4. We de-

ﬁne a total of twelve simulation scenarios by using

six different values for p

mbds

in both trafﬁc scenarios.

We use R = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} in all scenarios,

which results in a total of 100 simulation runs.

6.3 A Lower Bound for q

Our framework can be used to estimate the limits of

what misbehavior detection can achieve. We imple-

mented a detection mechanism A

perfect

that has ac-

cess to the internal attacker state and always classiﬁes

packets correctly.

Figure 4 shows the resulting q

for A

perfect

with

different penetration rates. We see that this attack is

very successful in the urban trafﬁc scenario, as the at-

tacker is able to achieve a high q

value only if no

detection mechanism is used. We conclude that even

low penetration rates of effective misbehavior detec-

tion systems can reduce the selﬁsh attacker’s advan-

tage substantially. For p

mbds

≥ 0.5, the attacker gains

no noticeable advantage by performing the attack. In

some simulation runs, we measure a q

< 0, which

A Practical Evaluation Method for Misbehavior Detection in the Presence of Selﬁsh Attackers

535

0 0.2 0.4

0.6

0.8 1

−1

−0.5

0.5

mbds

Petit, Feiri and Kargl, 90% conﬁdence intervals

urban trafﬁc scenario

highway trafﬁc scenario

Figure 5: Evaluation result for the mechanism of Petit, Feiri

and Kargl in both scenarios.

means the attack created a disadvantage for the at-

tacker. This is not caused by the detection of the at-

tack, but instead by the simplicity of the attack. The

false warnings sent by the attacker lead to an increase

in travel time in some constellations.

In the highway trafﬁc scenario, the attack is less

effective for lower penetration rates, but the attacker

still achieves some advantage with higher penetration

rates of misbehavior detection.

6.4 Results

We evaluated the scenarios with the mechanism from

Petit, Feiri and Kargl and show the results below. Fig-

ure 5 shows the values of q

for both trafﬁc scenarios

with different penetration rates. The measurements

show that the mechanism can effectively reduce the

attacker’s advantage. In the highway trafﬁc scenario,

we note that the mechanism’s q

values are very close

to the perfect detection mechanism.

The execution times of individual simulation runs

depend on the trafﬁc scenario and the penetration rate

mbds

. We executed the simulations on 8 cores with a

2,6 GHz clock rate and 16 GB of RAM. Eight simula-

tion runs were executed in parallel. Simulation runs of

the highway scenario were ﬁnished in under 6 minutes

each, while the complex urban trafﬁc scenario took

between 1.5 hours and over ten hours to complete.

Table 1 lists the execution times for some scenarios.

The execution times were measured using the perfect

detection mechanism. We suspect that the variations

in the urban scenario are caused by the reduced num-

ber of invocations of the trafﬁc simulation for larger

mbds

. Our measurements surprisingly show that even

removing a small percentage of vehicles from the at-

tacker’s inﬂuence can have a substantial impact on his

advantage in some situations.

Table 1: Results and Performance.

= highway

mbds

Execution Time (s)

0.00 0.75 356

0.25 0.16 354

0.80 0.07 361

1.00 0.00 351

= urban

mbds

Execution Time (s)

0.00 0.22 37 232

0.25 0.18 22 220

0.80 −0.13 9877

1.00 −0.11 5501

7 CONCLUSION

In this work, we propose a new metric for assessing

the quality of misbehavior detection mechanisms in

the presence of selﬁsh attackers. To improve over the

state-of-the-art metrics based on the confusion matrix

of packet classiﬁcations, we base our metric on mea-

suring the attacker’s success on achieving his goal.

We further present a simulation framework to

measure q

and propose two scenarios for testing at-

tacks and detection mechanisms. We analyze a sim-

pliﬁed variant of the detection mechanism proposed

by Petit, Feiri and Kargl in this framework and esti-

mate a lower bound for q

by using a perfect detec-

tion mechanism. We discuss run-time performance

and conclude that our metric can be used to evaluate

practical detection mechanisms in realistic trafﬁc sce-

narios.

Our metric q

allows a reliable assessment of

MDS and gives a realistic indication of the effec-

tiveness of an MDS, which is an improvement over

metrics based on the confusion matrix, such as the

precision-recall-graph. Our proposal is able to assess

MDS in the presence of selﬁsh attackers. The ﬁrst

application of our framework showed promising re-

sults and we hope to extend this analysis to other de-

tection schemes and more sophisticated attacks. We

show that metrics based on measuring the physical

process can be used to successfully evaluate MDS.

We observed that the interpretation of q

could beneﬁt

from additional trafﬁc measurements to better assess

negative effects that the misbehavior detection has on

desired effects.

VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

536

ACKNOWLEDGEMENTS

This work was supported by the Competence Center

for Applied Security Technology (KASTEL).

REFERENCES

Codeca, L., Frank, R., and Engel, T. (2015). Luxembourg

sumo trafﬁc (lust) scenario: 24 hours of mobility for

vehicular networking research. In 2015 IEEE Vehicu-

lar Networking Conference (VNC), pages 1–8.

Kamel, J., Wolf, M., van der Hei, R. W., Kaiser, A., Urien,

P., and Kargl, F. (2020). Veremi extension: A dataset

for comparable evaluation of misbehavior detection in

vanets. In IEEE International Conference on Commu-

nications (ICC), pages 1–6.

Lopez, P. A., Behrisch, M., Bieker-Walz, L., Erdmann, J.,

otter

od, Y.-P., Hilbrich, R., L

ucken, L., Rummel, J.,

Wagner, P., and Wießner, E. (2018). Microscopic traf-

ﬁc simulation using sumo. In The 21st IEEE Interna-

tional Conference on Intelligent Transportation Sys-

tems, pages 2575–2582. IEEE.

Petit, J., Feiri, M., and Kargl, F. (2011). Spoofed data de-

tection in vanets using dynamic thresholds.

Riebl, R., Obermaier, C., and G

unther, H.-J. (2019). Artery:

Large scale simulation environment for its applica-

tions. In Recent Advances in Network Simulation,

pages 365–406. Springer.

Riebl, R., Obermaier, C., Neumeier, S., and Facchi,

C. (2017). Vanetza: Boosting research on inter-

vehicle communication. In Proceedings of the 5th

GI/ITG KuVS Fachgespr

ach Inter-Vehicle Communi-

cation (FG-IVC 2017), pages 37–40.

Samara, G., Al-Salihy, W. A., and Sures, R. (2010). Se-

curity issues and challenges of vehicular ad hoc net-

works (vanet). In 4th International Conference on

New Trends in Information Science and Service Sci-

ence, pages 393–398. IEEE.

Sommer, C., German, R., and Dressler, F. (2011). Bidirec-

tionally Coupled Network and Road Trafﬁc Simula-

tion for Improved IVC Analysis. In IEEE Transac-

tions on Mobile Computing (TMC), volume 10, pages

3–15. IEEE.

van der Heijden, R. W. (2018). Misbehavior Detection in

Cooperative Intelligent Transport Systems. PhD the-

sis, Ulm, Germany.

van der Heijden, R. W., Dietzel, S., Leinm

uller, T., and

Kargl, F. (2016). Survey on misbehavior detection in

cooperative intelligent transportation systems.

van der Heijden, R. W., Dietzel, S., Leinm

uller, T., and

Kargl, F. (2019). Survey on misbehavior detec-

tion in cooperative intelligent transportation systems.

In IEEE Communications Surveys & Tutorials, vol-

ume 21, pages 779–811.

van der Heijden, R. W. and Kargl, F. (2017). Evaluating

misbehavior detection for vehicular networks. In 5th

GI/ITG KuVS Fachgespr

ach Inter-Vehicle Communi-

cation, page 5.

van der Heijden, R. W., Lukaseder, T., and Kargl, F. (2018).

Veremi: A dataset for comparable evaluation of mis-

behavior detection in vanets. In Beyah, R., Chang,

B., Li, Y., and Zhu, S., editors, Security and Privacy

in Communication Networks, pages 318–337, Cham.

Springer International Publishing.

Varga, A. (2001). Discrete event simulation system. In Pro-

ceedings of the European Simulation Multiconference

(ESM’2001), pages 1–7.

A Practical Evaluation Method for Misbehavior Detection in the Presence of Selﬁsh Attackers

537