Feature Selection for Anomaly Detection in Vehicular Ad Hoc Networks

Van Huynh Le, Jerry den Hartog and Nicola Zannone

Eindhoven University of Technology, Eindhoven, The Netherlands

Keywords:

Anomaly Detection, Vehicular Ad Hoc Network, Basic Safety Message, Crash Avoidance Systems.

Abstract:

An emerging trend to improve automotive safety is the development of Vehicle-to-Vehicle (V2V) safety

applications. These applications use information gathered from the vehicle’s sensors and from surrounding

vehicles to detect and prevent imminent crashes. Vehicles have been equipped with external communication

interfaces to make these applications possible, but this also exposes them to security threats. If an attacker is

able to feed safety applications with incorrect data, they might actually cause accidents rather than prevent

them. In this paper, we investigate the application of white-box anomaly detection to detect such attacks. A

key step in applying such an approach is the selection of the “right” behavioral features, i.e. features that allow

the detection of attacks and provide an understanding of the raised alerts. By ﬁnding meaningful features

and building accurate models of normal behavior, this work makes a ﬁrst step towards the design of effective

anomaly detection engines for V2V communication.

1 INTRODUCTION

Safety is one of the main concerns in the automotive

industry. Over the last decades, several safety

measures, such as airbag, anti-lock braking systems,

and electronic stability control systems, have been

developed and deployed in modern vehicles, signiﬁ-

cantly improving vehicle safety. Nevertheless, crashes

still happen, causing fatalities, injuries, and property

damage.

Collision avoidance technologies are nowadays

attracting signiﬁcant attention in the automotive

industry to further mitigate and even avoid crashes

entirely (Harding et al., 2014). One emerging trend

in crash avoidance is the development of Vehicle-

to-Vehicle (V2V) safety applications. By sharing

kinematics data, driver intents, and environment condi-

tions among nearby vehicles, V2V safety applications

aim to predict and warn drivers of imminent crashes.

To enable V2V safety applications and, in general,

automotive applications based on V2V commu-

nication, vehicle are equipped with an increasing

number of external communication interfaces. This,

however, rises several security concerns. In fact,

the increasing connectivity of vehicles has enlarged

the attack surface and several attacks have already

been demonstrated (Koscher et al., 2010; Foster

et al., 2015; Miller and Valasek, 2015; Mazloom

et al., 2016; Palanca et al., 2017). In particular, safety

V2V applications may be disrupted by cyber-security

attackers by providing incorrect information to

surrounding vehicles; by reacting to such forged

information these vehicles can unwittingly cause

incidents. Our goal is to detect such attacks.

A common approach to attack detection is

to monitor the network and ﬁnd deviations from

the normal behavior, i.e. the so called anomalies.

To enable appropriate handling of alerts raised

upon ﬁnding an anomaly, the detection method

should provide an understanding of what causes the

alerts (Sommer and Paxson, 2010). An additional

challenge in anomaly detection for V2V safety

applications is that we have to distinguish attacks

from actual driving behaviors, not only under normal

conditions, but also in ‘extreme’ situations such as

pre-crashes. The risk is that pre-crashes and attacks

are confused while they require completely different

handling: safety applications must quickly inform

drivers of pre-crashes while discarding messages

corresponding to attacks.

To address the challenges above we study how

a semantics-aware white-box anomaly detection

framework (Costante et al., 2017) can be applied

to detect attacks that can disrupt the functioning of

V2V safety applications. A key step in applying this

anomaly detection approach is the selection of the

“right” behavioral features to model normal behavior;

features should allow the detection of attacks as

Le, V., Hartog, J. and Zannone, N.

Feature Selection for Anomaly Detection in Vehicular Ad Hoc Networks.

DOI: 10.5220/0006946804810491

In Proceedings of the 15th International Joint Conference on e-Business and Telecommunications (ICETE 2018) - Volume 1: DCNET, ICE-B, OPTICS, SIGMAP and WINSYS, pages 481-491

ISBN: 978-989-758-319-3

481

well as provide an understanding of the raised alerts.

Driven by experiments on a real-life dataset of

messages for V2V safety applications, we discuss

the main challenges in feature selection for anomaly

detection in V2V communication. By ﬁnding useful

features and building models of normal behavior this

work provides a ﬁrst step towards accurate anomaly

detection engines for V2V communication.

The remainder of the paper is structured as

follows. The next section introduces collision avoi-

dance systems along with the situations that they aim

to avoid and the message format they use. Section 3

presents an attacker model along with sample attack

scenarios. Section 4 presents the white-box anomaly

detection framework and the dataset used for our expe-

riments. Section 5 discusses the problem of feature

selection and Section 6 presents the results of the expe-

riments. Section 7 discusses related work on detecting

false information injected into V2V networks. Finally,

Section 8 draws conclusions and identiﬁes directions

for future work.

2 COLLISION AVOIDANCE

SYSTEMS

Collision avoidance systems (also called V2V safety

applications) are designed to prevent or attenuate

the severity of collisions. In particular, they use

information gathered from the vehicle’s sensors and

from external sources (e.g., surrounding vehicles,

roadside units) to detect an imminent crash, which is

described by means of pre-crash scenarios (Najm et al.,

2013). To ensure interoperability among vehicles,

V2V safety applications use a standardized message

format called Basic Safety Message (BSM) (SAE

Motor Vehicle Council, 2008).

In this section, we present some relevant pre-crash

scenarios typically addressed by V2V safety applica-

tions and the Basic Safety Message (BSM) format

upon which those applications are built.

2.1 Pre-crash Scenarios

Rear-end, lane change, and opposite direction crashes

are three common crash scenarios. They have been

identiﬁed as priority targets to be addressed by V2V

safety applications (Najm et al., 2013).

Rear-end Scenarios:

A vehicle (V1) hits a slower

vehicle (V2) moving in the same lane from behind

(Figure 1(a)).

Lane-change Scenarios:

A vehicle (V1) changes

lane, turns, or drifts out of its lane, crashing into

(a) Rear-end

(b) Lane change

Figure 1: Pre-crash scenarios.

another vehicle (V2) traveling in the same direction

(Figure 1(b)).

Opposite Direction Scenarios:

A vehicle (V2) drifts

out of its lane or passes a vehicle in front, colliding

with another vehicle (V1) coming from the opposite

directions (Figure 1(c)).

2.2 Basic Safety Message

V2V safety applications detect pre-crashes by sharing

data, such as kinematics data, driver intents, and

environment conditions among neighboring vehicles.

Basic Safety Message (BSM) (SAE Motor Vehicle

Council, 2008) is a standardized message format

designed to ensure interoperability among vehicles

and, thus, support the exchange of data required by

V2V safety applications.

A BSM consists of a mandatory part (BSM part I)

and an optional part (BSM part II). BSM part I contains

instantaneous status information of the sending vehicle,

including its location, motion, brake system status, and

the size of the vehicle. An overview of the ﬁelds is

provided in Table 1. BSM part II contains additional

sensor data (e.g., sunlight level) and event records (e.g.,

ﬂat tire, light status change, airbag deployment). In

the scope of this paper, we only consider BSM part I

as it is mandatory and contains essential information

for V2V safety applications.

3 ATTACKS ON V2V

COMMUNICATION

The increasing connectivity of modern vehicles has

attracted considerable attention in the security ﬁeld

and several attacks have already been demonstrated,

some of them having a signiﬁcant impact on vehicle

safety. In particular, attacks can lead V2V safety

applications to signal misleading warnings or take

dangerous actions, thus undermining safety and

BASS 2018 - International Workshop on Behavioral Analysis for System Security

482

Table 1: Fields in BSM part I.

Data elements

Description

Available

in dataset

Used for

proﬁling

Standard elements

DSRCmsgID Identiﬁer for message type Yes No

SecMark A timestamp in milisecond, modulo 1 minute Yes No

MsgCount Sequence number for messages of the same type Yes Yes

Temporary ID Random MAC/IP address, changed periodically to ensure anonymity Yes Yes

Latitude Latitude of the vehicle Yes Yes

Longitude Longitude of the vehicle Yes Yes

Speed Speed of the vehicle Yes Yes

Heading Current heading of the vehicle relative to North Yes Yes

Yaw Rate The vehicle’s rotation in degrees per second Yes Yes

Longitudinal Acceleration Acceleration along the vehicle longitudinal axis Yes Yes

Lateral Acceleration Acceleration along the vehicle lateral axis Yes Yes

Vertical Acceleration Acceleration along the vehicle vertical axis Yes No

Elevation Elevation from the sea level Yes No

Positional Accuracy

Semi-major accuracy at one standard deviation, semi-minor accuracy at one standard

deviation, and orientation of semi-major axis relative to true North

Some No

Brake System Status

Brake applied status, traction control state, antilock brake status, stability control

status, and brake boost applied

Some No

Vehicle Length Length of the vehicle No No

Vehicle Width Width of the vehicle No No

Metadata

Recorder ID ID of the vehicle recording the message Yes Yes

Sender ID ID of the sending vehicle Yes Yes

Gentime Timestamp when the message is created Yes Yes

hindering user acceptance of V2V safety applications.

In this section, we introduce an attacker model and

present sample attacks for the pre-crash scenarios

presented in Section 2.1.

3.1 Attacker Model

Because V2V safety applications rely on data shared

among neighboring vehicles, attackers can disrupt the

correct functioning of these applications by injecting

bogus information into V2V networks. Such bogus

information can cause applications to signal false

warnings, or in the case of autonomous vehicles, to

make unnecessary and potential dangerous maneuvers.

Several attacker models have been proposed in the

context of V2V applications (Papadimitratos et al.,

2006; Raya and Hubaux, 2007). For instance, Raya

and Hubaux (2007) distinguishes insider vs. outsider,

active vs. passive, local vs. extended attackers. In this

work, we consider active inside attackers, which we

further classify based on their capability to modify

data ﬁelds in a message:

M1. Attackers can manipulate one or a limited set of

sensor data. For example, they can inject bogus

message into Controller Area Network (CAN),

which result in incorrect sensor data (Koscher et al.,

2010) or they can spoof the GPS signal resulting

in false location data.

M2.

Attackers have complete control over one vehicle

and can create arbitrary BSM. This is equivalent

to an attacker who can manipulate all sensor data

without affecting the operation of his own vehicle.

M3.

Attackers control multiple vehicles, which can

be either real or simulated. This class includes

Sybil attackers, in which one attacker pretends to

be multiple vehicles.

3.2 Attacks based on Pre-crash

Scenarios

In this section, we present sample attacks based on the

pre-crash scenarios discussed in Section 2.1.

Rear-end Pre-crash Scenario.

The leading vehicle

(V2) can make the vehicle behind (V1) brake by

creating the false impression of a rear-end pre-crash

scenario (Figure 2(a)). To this end, V2 can report loca-

tion closer to V1, lower speed, negative longitudinal

acceleration, or their combination.

In vehicular networks, several cryptographic mecha-

nisms, such as digital signatures, have been proposed to

ensure message integrity. Typically, these mechanisms rely

on the use of secret keys. If an attacker has obtained such a

key, he can craft and send any message.

Feature Selection for Anomaly Detection in Vehicular Ad Hoc Networks

483

(fake)

(a) Rear-end attack scenario

(fake)

(b) Lane change attack scenario

(fake)

Figure 2: Attacks based on simulating pre-crash scenarios.

Lane Change Scenario.

A vehicle (V2) can make

another vehicle (V1) abort changing lane by causing

the misconception that V2 is in a colliding course

with V1 (Figure 2(b)). To this end, V2 can report false

location, higher speed, and higher acceleration. Alter-

natively, V1 could make V2 brake by reporting that it

is changing lane while actually going straight forward.

Opposite Direction Scenario.

A vehicle (V2)

can persuade another vehicle (V1) to take evasive

maneuver (e.g., turning right) by reporting false loca-

tion (Figure 2(c)).

All attack scenarios above can be performed by

an attacker of type M1 that has control over a rele-

vant sensor. An attacker of type M2 could strengthen

the attack by reporting multiple incorrect values, for

example, manipulating both the reported speed and

position. The additional capabilities of M3 level attac-

kers does not really impact these scenarios. In the

remainder, we will focus on M1 attackers and leave

higher level attackers as future research.

4 METHODOLOGY

To defend against attacks, several preventive security

mechanisms such as digital signatures (Harding et al.,

2014; European Telecommunications Standards Insti-

tute, 2012) and hardware security module (Apvrille

et al., 2010) have been proposed to ensure the inte-

grity of V2V messages. Anomaly detection is a non-

invasive approach that can complement such techni-

ques, adding a layer of defense.

An anomaly detection method should not only have

a high detection rate and a low false positive rate but

Normal

Attacks

Pre-crashes

Rear end

Lane change

Opposite direction

Figure 3: Behavior classiﬁcation.

also provide useful information about the alerts that are

raised. This information is necessary to make the alert

actionable (Sommer and Paxson, 2010), meaning that

it should provide the information necessary to under-

stand what caused the alert and to choose an appro-

priate response to it. Here, we have the added chal-

lenge that we aim to distinguish between attacks and

pre-crash situations, which, while part of the normal

behavior, are already exceptional situations (Figure 3).

Being able to understand the alert is important to distin-

guish false positives (e.g., a real pre-crash condition

that looks like an attack) from real alerts (caused by

attacks).

We apply a semantics-aware white-box anomaly

detection approach (Costante et al., 2017) that can

extract understandable models and alerts to BSM-

based V2V communication.

4.1 White-box Anomaly Detection

The framework (Figure 4) creates a model of normal

behavior, called a proﬁle, and detects anomalies,

i.e. events not ﬁtting this proﬁle. Proﬁles capture the

distribution of features of normal events happening in

the system and are learned from a training set consis-

ting of all feature values for a sequence of normal

events. Here, events are (sequences of) BSMs being

sent while features typically capture (combinations of)

ﬁelds in such messages. Multiple proﬁles can be used

to capture behavior in different conditions that inﬂu-

ence normal behavior, for example, different proﬁles

for nighttime and daytime.

Proﬁles can be represented as histograms, a format

easily understood by human operators. This enables

an operator to perform tuning, in which a threshold for

separating normal and anomalous behavior is chosen,

resulting in a detection engine.

In the detection phase, the detection engine raises

an alert for events where a feature takes a value whose

likelihood is below the threshold. Response to the

alerts is out of scope of anomaly detection, however,

since the alerts indicate which features exhibit unlikely

values, they capture important information needed

by other mechanisms to take appropriate actions.

Obtaining meaningful alerts also enables human

BASS 2018 - International Workshop on Behavioral Analysis for System Security

484

Raw data

Training

set

Profiles

Detection

engine

1: Feature extraction

3: Tuning

5: Feedback

Alerts

4: Detection

2: Profiling

Figure 4: White-box anomaly detection framework.

operators to provide feedback, i.e. adjust the detection

engine to reduce false positives.

Selecting the right features is essential in crea-

ting a useful detection engine. As this is a data

driven process, we ﬁrst present the dataset used in

our analysis before detailing feature selection in the

next section.

4.2 Dataset

For our analysis, we use a dataset comprising BSMs

collected by on-board devices during Safety Pilot

Model Deployment (U.S. Department of Transporta-

tion, 2018). The data was collected in the Ann Arbor

region, Michigan (Figure 5

It is worth noting that the messages in this dataset

do not fully comply with the standard BSM format

(Table 1). In particular, the dataset lacks informa-

tion on vehicle length and width. Moreover, only

some messages contain positional accuracy and brake

system status.

While missing some ﬁelds, the dataset contains

metadata including message Sender ID, message

Recorder ID (ID of the vehicle recording the message),

and message Gentime (the timestamp when the

message is generated). We use the ID’s of message

senders and recorders in addition to Temporary ID and

MsgCount to differentiate vehicles. Moreover, we use

Gentime to calculate additional features (as described

in Section 5).

Note that the use of these metadata is not a problem

when moving to standard BSMs because a receiver of

a message will have context information to replace

these metadata: Temporary ID is a random number

assigned to each vehicle. Although the chance that

two vehicles share the same Temporary ID is very low,

collisions of IDs are possible. Here, we used Sender

ID and Recorder ID to distinguish vehicles. However,

the same result could have been obtained using other

Map background: map tiles by CartoDB, which are

derived from data by OpenStreetMap contributors.

This information is logged in separate ﬁles, which

however do not follow the same time series format used

in the rest of the dataset. Therefore, we did not consider this

information in our experiments.

Figure 5: Distribution of 100 million BSM samples within

the Ann Arbor area. Darker purple shades correspond to

higher message densities.

information in BSMs such as time and location. More-

over, knowing the time a message arrives, the receiver

can infer Gentime from SecMark (timestamp modulo

1 minute).

To sum up, the following data elements are avai-

lable in the dataset:

•

Data about the motion of the vehicle gene-

rating the message, including Speed, Longitu-

dinal/Lateral/Vertical Acceleration, and Yaw Rate.

• Timing data, including SecMark and Gentime,

•

Location data, including Latitude, Longitude,

Elevation, and Heading.

•

Data about message sequences, including: Tempo-

rary ID, Recorder ID, Sender ID, and MsgCount.

In the next section, we discuss how we extracted

features from these data elements.

5 FEATURE SELECTION

An important step in applying the anomaly detection

framework in Section 4.1 to the automotive safety

setting is to extract relevant features from BSMs.

Although the BSM format has a ﬁxed number of data

elements, there are numerous potential features. For

example, in BSMs a message contains several depen-

dent data ﬁelds, such as speed and acceleration or

heading and yaw rate. In addition, the BSMs from

the same vehicle naturally form a time series. On

the other hand, data elements such as longitude and

latitude can be fairly noisy. The challenge thus is to

deﬁne features that are robust to noise and exploit the

available information to reveal attacks.

Below we ﬁrst consider the most obvious features

Feature Selection for Anomaly Detection in Vehicular Ad Hoc Networks

485

extracted from a single ﬁeld of a message. Moti-

vated by the M1 attacker model, we also look at

features that check the consistency of the reported

values, for example comparing the reported speed to

that computed from reported positions. To address

noise in the data we consider features computed over

sequences of messages rather than single messages.

Features for Message Fields.

By ﬁnding the normal

values for BSM ﬁelds, we can detect simple attacks

that set a ﬁeld to an unusual value to induce a safety

application to misbehave. Rather than just repor-

ting a ﬁeld value, a feature (e.g. road type) could be

computed from the ﬁeld value (e.g. location), possibly

using some external information (e.g. a road map).

While, as we see in the next section, most normal

values are grouped into a limit range, overall a wide

range of values is possible; setting thresholds too

high will possibly yield many false positives. This

means that if an attack is more sophisticated, it could

create dangerous situations by using values that are not

abnormal by themselves. Using different proﬁles for

different situations can potentially create more accu-

rate models (e.g., daytime driving may be different

from nighttime driving; driving on highways may be

different from driving in an urban area).

Compound Features.

For many attacks, single ﬁelds

will not contain enough information for their detection.

We thus also consider relations between features.

Several features will naturally show some correlation.

For example, turning will impact both yaw rate and

lateral acceleration and strong turns at high speed are

unlikely. Combined features Yaw Rate-Lateral Accele-

ration respectively Yaw Rate-Speed will capture these

relations. We expect such combined features to be

effective against M1 type attackers that can control one

of the values but not the other. Even higher type attac-

kers would need to carefully construct their attacks

to ensure the combination of reported fake values is

believable.

Features for Consistency Check.

Modiﬁcations

of data elements in BSMs can also be detected by

checking their consistency within one message as well

as across messages. In the latter case, we need to

consider sequences of BSMs coming from single vehi-

cles. In our dataset, we can ﬁnd such sequences by

looking at the Temporary ID. While Temporary ID of

vehicles changes regularly (e.g. every 5 minutes), on

the time scale we are looking at (less than a second) it

is mostly constant.

To check consistency we can use basic physical

laws; for example, if we take the position and time

from two messages we can compute the average speed

and compare that to the speed reported in the messages.

Similarly, we can compare the reported yaw rate with

the change in heading. Computing with a small times-

cale ampliﬁes measurement noise. To reduce this

effect, we compute over a window of

n ≥ 2

messages,

going back

n − 1

messages rather than comparing to

the previous message.

In the next section we select several features cove-

ring the different types above and use them to build a

detection engine.

6 EXPERIMENTS

We conduct a number of experiments to evaluate the

impact of feature selection on the ability of a detection

engine to identify potential attacks. In our experi-

ments, we use the dataset described in Section 4.2 and

build proﬁles of normal behavior using the features

described in Section 5. These proﬁles are used to build

detection engines for the white-box anomaly detection

framework described in Section 4.1.

The dataset contains many BSMs with the lateral

acceleration ﬁeld set to

20.01 m/s

(> 2g)

. We

believe that this speciﬁc value is due to equipment or

data recording errors.

Thus, we remove all messages

with this lateral acceleration value from the dataset.

The resulting dataset contains more than 1.2 billion

messages. We use

80%

of this dataset to build proﬁles

and 20% for testing.

We generate an attack dataset that simulates rear-

end pre-crash scenarios. To this end, we selected from

the testing dataset BSMs that have a speed between

11.4

30.0 m/s

. This speed range is based on the

operational speed for which forward collision warning

is applicable (Toma et al., 2013). We randomly select

half a million of such BSMs and set their Speed ﬁeld

. Each simulates a tampered BSM that pretends

the attacker has stopped to a target vehicle following

the attacker in the same lane and at a similar speed.

Although the considered attack is very simple, it can

already demonstrate the effectiveness of different types

of features on the detection of potential attacks.

To evaluate the quality of a detection engine we

draw its receiver operating characteristic (ROC) curve,

which plots the trade-off between false positive rate

(FPR) and detection rate (DR) it induces. FPR is the

ratio between the number of BSMs wrongly catego-

rized as alerts (false positives) and the total number of

BSMs. DR is the ratio between the number of attacks

correctly categorized as alerts and the total number of

Another possible issue is the unit of measurement for

Speed; the dataset documentation states it is expressed in

m/s but mph seems more likely given the data. Nonetheless,

we keep the interpretation given by the documentation.

BASS 2018 - International Workshop on Behavioral Analysis for System Security

486

Figure 6: Distribution of BSMs over the day.

attacks. Formally:

FPR =

# false alerts

# messages

DR =

# true alerts

# attacks

In the remainder of this section, we ﬁrst present

the obtained proﬁles and then we evaluate the corre-

sponding detection engines.

6.1 Proﬁling

We build proﬁles from the training dataset using the

features described in Section 5. A hypothesis we

consider is that different driving patterns may occurs

during daytime and nighttime. As building a proﬁle

from heterogeneous behaviors can lead to misleading

conclusions (Alizadeh et al., 2018), we partitioned the

training dataset in two parts and built two proﬁles, one

for daytime and one nighttime. Daytime and nighttime

were identiﬁed based on the distribution of BSMs over

the time of the day (Figure 6). In the ﬁgure, we can

observe two peaks corresponding to morning and after-

noon rush hours between 6AM and 8AM and between

5PM and 6PM respectively. On the other hand, we

observed a low message frequency between 7PM and

6AM. Accordingly, we considered this time interval

as nighttime and from 6AM till 7PM as daytime.

Features for Message Fields.

Our ﬁrst set of features

make use of single data elements in BSMs. Figures 7

and 8 present the proﬁles for daytime and night-

time respectively with respect to data elements Speed,

Longitudinal Acceleration, Lateral Acceleration, and

Yaw Rate. The differences between the two proﬁles

turn out to be very small, indicating that driving

patterns are in fact not inﬂuenced much by the time

of the day. Accordingly, using separate proﬁles for

daytime and nighttime would not improve the effecti-

veness of the detection engine. Note that the proﬁle

over the entire dataset can be obtained as the sum of

proﬁles in Figures 7 and 8. For the sake of space, we

do not visualize this again very similar proﬁle.

We expect that these features can only help detect

very simple attacks in which data elements in BSMs

are set to very unusual values (e.g., a negative value

for Speed).

Compound Features.

To enable more ﬁne-grained

detection capabilities, we also built proﬁles using

compound features. In particular, we built proﬁles

using pairwise combinations of Speed, Longitudinal

Acceleration, Lateral Acceleration, and Yaw Rate to

capture the relations between these data ﬁelds. We

visualize these proﬁles using heatmaps to give insight

into 2D features (Figure 9). From this, we can observe

the following patterns:

•

Higher values of Speed often correspond to lower

variation in both Longitudinal Acceleration and

Lateral Acceleration.

•

Combinations (Speed, Longitudinal Accelera-

tion), (Speed, Lateral Acceleration), (Longitudinal,

Lateral Acceleration), and (Longitudinal Accelera-

tion, Yaw Rate) do not typically have large values

at the same time. This means that drivers usually

maintain a constant speed when they travel fast

and do not accelerate/decelerate while turning or

changing lanes.

•

Data elements Lateral Acceleration and Yaw Rate

exhibit a linear/inverse-linear correlation.

Despite these patterns, all features are spread out, indi-

cating diversity in the normal behavior. Therefore, we

expect that detection engines based on these features

will have limited detection capabilities.

Features for Consistency Check.

The ﬁnal set of

features are built over sequences of BSMs. In our

experiments, we considered various message window

sizes (i.e.,

n = 2,3,5,7

). Figures 10 and 11 visualize

the Speed and Heading reported in the BSM compared

to their value estimated over varying window sizes

using location and timing data.

The ﬁgures show a close correlation between the

values of Speed and Heading reported in BSMs and

the values estimated from location and timing data.

By comparing the different window sizes we see that

reducing the effect of noise on the position data by

increasing the window size further strengthens this

correlation.

As the obtained proﬁles exhibit clear patterns, these

features are promising to detect anomalies characte-

rized by inconsistencies between location data, speed,

and heading.

Feature Selection for Anomaly Detection in Vehicular Ad Hoc Networks

487

(a) Speed (b) Longitudinal Acceleration (c) Lateral Acceleration (d) Yaw Rate

Figure 7: Proﬁles of 1D features for message ﬁelds (daytime).

(a) Speed (b) Longitudinal Acceleration (c) Lateral Acceleration (d) Yaw Rate

Figure 8: Proﬁles of 1D features for message ﬁelds (nighttime).

(a) Speed-Longitudinal Acceleration (b) Speed-Lateral Acceleration (c) Speed-Yaw Rate

(d) Longitudinal-Lateral Acceleration

(e) Longitudinal Acceleration-Yaw Rate

(f) Lateral Acceleration-Yaw Rate

Figure 9: Proﬁles of 2D compound features.

6.2 Anomaly Detection

We combine the features above into different detection

engines and show their effectiveness using ROC curves

(Figure 12). We consider an engine (D1) using only

message ﬁeld features, one (D2) using compound

features and four (D3-D6) using the consistency check

features with different window sizes.

Features for Message Fields.

We ﬁrst consider a

detection engine D1 that uses the following features:

Speed, Longitudinal Acceleration, Lateral Accele-

ration, and Yaw Rate. As already expected from

the spread-out proﬁles of the features employed

BASS 2018 - International Workshop on Behavioral Analysis for System Security

488

(a) Window size = 2 (b) Window size = 3 (c) Window size = 5 (d) Window size = 7

Figure 10: Proﬁling Speed reported by vehicles and speed estimated from latitude and longitude using different window sizes.

(a) Window size = 2 (b) Window size = 3 (c) Window size = 5 (d) Window size = 7

Figure 11: Proﬁling Heading reported by vehicles and heading estimated from latitude and longitude using different window

sizes.

0.0 0.2 0.4 0.6 0.8 1.0

False Posit ive Rat e

0.0

0.2

0.4

0.6

0.8

1.0

Det e ct ion Rat e

Figure 12: ROC curves of six detection engines using diffe-

rent feature sets.

(Figures 7 and 8), this detection engine performs

poorly (Figure 12). In fact, the detection engine actu-

ally performs worse than random guessing. A reason

for this is that the attack we consider sets the Speed

to 0, which is actually a common value. Also, over

the speed range at which the attacks occur, Longitu-

dinal Acceleration, Lateral Acceleration and Yaw Rate

more often take low and thus common values (see

Section 6.1).

Compound Features.

Detection engine D2 uses

compound features that are pairwise-combinations of

Speed, Longitudinal Acceleration, Lateral Accelera-

tion, and Yaw Rate. Figure 12 shows this detection

engine can detect a small number of attacks, but after

about 10% of the attacks it very much behaves like

random guessing. As can be seen from Figure 9, some

combinations of (longitudinal or lateral) acceleration

with a value of speed equal to 0 are unusual, allowing

the detection of the attack. However, only a small

percentage of the attacks have such values for accele-

ration rates. We performed further investigation and

observed that a compound feature out of all these three

values slightly improves detection rate but only up to

around 17% of the attacks.

Features for Integrity Check.

Detection engines

D3–D6 check for the consistency between speed

and heading reported in BSMs and their values

estimated from location and timing data. D3–D6

use features Speed-EstimatedSpeed and Heading-

EstimatedHeading, with the estimated values obtained

from windows of sizes

2,3,5,7

in that order. These

detection engines perform fair much better compared

to the ones built using message ﬁelds as features or

compound features. While estimation errors with

small windows may still cause some errors, using

window size 5 results in a nearly perfect detector.

From this, we can conclude that this window size is

sufﬁcient to reduce noise. Moreover, as these features

capture the consistency of values rather than the values

Feature Selection for Anomaly Detection in Vehicular Ad Hoc Networks

489

themselves, they are very well suited for detecting

attacks like the one we considered, where a single

ﬁeld is changed to a value that is common in general.

7 RELATED WORK

Several approaches to identify false information

injected into V2V networks have been proposed, yet

it remains an open challenge. Golle et al. (2004)

propose a general approach exploiting the redundancy

in sensor data shared among neighboring vehicles.

Each vehicle uses a model of the Vehicular Ad Hoc

Network (VANET) to check the validity of sensor

data. When an inconsistency is detected, an adver-

sarial model is used to explain it. Sensor data dictates

the VANET model, adversarial model, and the algo-

rithm to explain inconsistencies. However, the use of

real sensor data was not studied, and thus no concrete

models were given.

Lo and Tsai (2007) propose a rule-based approach

to detect false or outdated information in VANETs.

This approach employs the following rules: duplicate

messages must be dropped; the broadcast range must

be reasonable; the timestamp must be checked; and the

velocity must not be too high compared to the speed

limit. However, only sensor-cheating attacks have

been considered (attacker model M1). As suggested

by the results in Section 6.2, when individual data

elements are used for detection, stronger attackers may

craft messages that bypass all given rules.

Schmidt et al. (2008) propose several checks to

evaluate the trust level of vehicles. In particular, this

work proposes to verify whether: average speed, acce-

leration, and heading belong to valid ranges; vehicles

have moved during a certain time (in contrast to statio-

nary road-side attackers); the position claimed by vehi-

cles is on the road; vehicles do not suddenly appear;

the frequency of messages from a vehicle does not

exceed a threshold. Given our attacker models, these

checks can be easily bypassed; for example, setting

Speed to a lower value does not violate any of these

rules. Another test proposed by Schmidt et al. (2008)

is the sensor-proofed position check, which requires an

additional sensor, such as radar, to verify the distance

to a neighboring vehicle.

In addition to simple threshold checks of velocity,

message frequency, distance between message senders

and receivers, and timestamp, Stübing et al. (2010)

suggest to predict the movement of a vehicle using

Kalman ﬁlter. Then, the predicted movement and the

movement reported by messages are checked for incon-

sistencies. Our detection engines may beneﬁt from this

prediction method, which could be more precise than

our current estimation in Section 6.1.

Another direction towards intrusion detection in

V2V networks is to exploit properties of the physical

communication layer, such as the signal strength (Xiao

et al., 2006) to verify a sender’s location. When such

data is available, it can be used as an additional feature

in our framework.

8 DISCUSSION AND

CONCLUSION

In this paper, we discuss the problem of feature

selection in the context of white-box anomaly

detection for V2V safety applications. Driven by a

real dataset of BSMs, we investigate how to build

accurate proﬁles of normal behavior. To this end, we

exploit features corresponding to BSM data elements

as well as more complex features including compound

features and features that check consistency.

Although the results of our experiments are promi-

sing, a number of challenges still need to be addressed.

•

The collected data are not always accurate. For

instance, the GPS data in our dataset does not have

the precision required by National Highway Trafﬁc

Safety Administration (NHTSA) (2017).

•

Driving behavior is inﬂuenced by several factors.

In this work, we have investigated whether the

time of the day has an impact on driving beha-

vior, which however does not seem the case in the

analyzed dataset. Nonetheless, other factors may

be relevant, e.g. driving style (Eboli et al., 2017),

road type and weather conditions.

•

The dataset does not distinguish BSMs correspon-

ding to a normal situation to the ones correspon-

ding to a pre-crash scenario. This contributes to

the heterogeneity of the data used to build proﬁles

of normal behavior.

These issues can affect the quality of the built proﬁles

and, thus, their detection capabilities.

In future work, we plan to address those challenges

by constructing proﬁles speciﬁc to driving conditions

(e.g., driver behavior, pre-crash scenarios, road type).

For instance, one could exploit V2V safety applica-

tions to recognize BSMs that correspond to a given

pre-crash scenario or use position data (Figure 5) to

identify the road type. In addition, other BSM data

elements (e.g., brake status) and the ﬁelds deﬁned in

BSM part II (e.g., path prediction data) can be used to

classify driving conditions.

We tested the deﬁned detection engines with a

simple attack. Although this already provides us with

insights on the impact of feature selection on their

detection capabilities, more complex attacks should be

BASS 2018 - International Workshop on Behavioral Analysis for System Security

490

considered. In this respect, we plan to carry out more

extensive experiments with attacks of higher level as

well as targeting other pre-crash scenarios.

ACKNOWLEDGMENT

This work is partially supported by Rijkswaterstaat

under the TU/e Smart Mobility programme, by ITEA3

through the APPSTACLE project (15017) and by

ECSEL through the SECREDAS project.

REFERENCES

Alizadeh, M., Peters, S., Etalle, S., and Zannone, N. (2018).

Behavior Analysis in the Medical Sector: Theory and

Practice. In Proceedings of ACM/SIGAPP Symposium

On Applied Computing. ACM.

Apvrille, L., El Khayari, R., Henniger, O., Roudier, Y.,

Schweppe, H., Seudié, H., Weyl, B., and Wolf,

M. (2010). Secure automotive on-board electronics

network architecture. In Proceedings of FISITA 2010

World Automotive Congress. EURECOM.

Costante, E., den Hartog, J., Petkovi

c, M., Etalle, S., and

Pechenizkiy, M. (2017). A white-box anomaly-based

framework for database leakage detection. Journal of

Information Security and Applications, 32:27–46.

Eboli, L., Mazzulla, G., and Pungillo, G. (2017). How

drivers’ characteristics can affect driving style. Trans-

portation Research Procedia, 27:945–952.

European Telecommunications Standards Institute (2012).

ETSI TS 102 940 V1.1.1 - Intelligent Transport

Systems (ITS); Security; ITS communications security

architecture and security management.

Foster, I. D., Prudhomme, A., Koscher, K., and Savage, S.

(2015). Fast and vulnerable: A story of telematic

failures. In Proceedings of USENIX Workshop on

Offensive Technologies. USENIX Association.

Golle, P., Greene, D., and Staddon, J. (2004). Detecting and

correcting malicious data in VANETs. In Proceedings

of the 1st ACM International Workshop on Vehicular

Ad Hoc Networks, pages 29–37. ACM.

Harding, J., Powell, G., Yoon, R., Fikentscher, J., Doyle,

C., Sade, D., Lukuc, M., Simons, J., and Wang, J.

(2014). Vehicle-to-Vehicle communications: Readi-

ness of V2V technology for application.

Koscher, K., Czeskis, A., Roesner, F., Patel, S., Kohno, T.,

Checkoway, S., McCoy, D., Kantor, B., Anderson, D.,

Shacham, H., et al. (2010). Experimental security

analysis of a modern automobile. In Proceedings of

IEEE Symposium on Security and Privacy, pages 447–

462.

Lo, N.-W. and Tsai, H.-C. (2007). Illusion attack on VANET

applications - a message plausibility problem. In

Proceedings of Globecom Workshops, pages 1–8.

Mazloom, S., Rezaeirad, M., Hunter, A., and McCoy, D.

(2016). A security analysis of an in vehicle infotain-

ment and app platform. In Proceedings of USENIX

Workshop on Offensive Technologies. USENIX Associ-

ation.

Miller, C. and Valasek, C. (2015). Remote exploitation of

an unaltered passenger vehicle. Black Hat.

Najm, W. G., Toma, S., and Brewer, J. (2013). Depiction

of priority light-vehicle pre-crash scenarios for safety

applications based on vehicle-to-vehicle communica-

tions. Technical report, National Highway Trafﬁc

Safety Adminstration (NHTSA).

National Highway Trafﬁc Safety Administration (NHTSA)

(2017). Federal motor vehicle safety standards; V2V

communications. Proposed Rule.

Palanca, A., Evenchick, E., Maggi, F., and Zanero, S. (2017).

A stealth, selective, link-layer denial-of-service attack

against automotive networks. In Proceedings of Confe-

rence on Detection of Intrusions and Malware & Vulne-

rability Assessment, pages 185–206. Springer.

Papadimitratos, P., Gligor, V., and Hubaux, J.-P. (2006).

Securing Vehicular Communications – Assumptions,

Requirements, and Principles. In Proceedings of

Workshop on Embedded Security in Cars, pages 5–14.

Raya, M. and Hubaux, J.-P. (2007). Securing vehicular ad

hoc networks. Journal of Computer Security, 15(1):39–

68.

SAE Motor Vehicle Council (2008). Draft SAE J2735 - Dedi-

cated Short Range Communications (DSRC) Message

Set Dictionary.

Schmidt, R. K., Leinmüller, T., Schoch, E., Held, A., and

Schäfer, G. (2008). Vehicle behavior analysis to

enhance security in vanets. In Proceedings of the 4th

IEEE Vehicle-to-Vehicle Communications Workshop

(V2VCOM2008).

Sommer, R. and Paxson, V. (2010). Outside the closed

world: On using machine learning for network intru-

sion detection. In Proceedings of IEEE Symposium on

Security and Privacy, pages 305–316. IEEE.

Stübing, H., Jaeger, A., Schmidt, C., and Huss, S. A. (2010).

Verifying mobility data under privacy considerations in

Car-to-X communication. In Proceedings of the 17th

ITS World Congress.

Toma, S., Swanson, E., and Najm, W. G. (2013). Light

vehicle crash avoidance needs and countermeasure

proﬁles for safety applications based on vehicle-to-

vehicle communications. Technical report, National

Highway Trafﬁc Safety Adminstration (NHTSA).

U.S. Department of Transportation (2018). Safety

Pilot Model Deployment data. https://data.

transportation.gov/Automobiles/Safety-Pilot-Model-

Deployment-Data/a7qq-9vfe. Accessed: 2018-03-01.

Xiao, B., Yu, B., and Gao, C. (2006). Detection and localiza-

tion of sybil nodes in VANETs. In Proceedings of the

2006 Workshop on Dependability Issues in Wireless Ad

Hoc Networks and Sensor Networks, pages 1–8. ACM.

Feature Selection for Anomaly Detection in Vehicular Ad Hoc Networks

491