Learning Classiﬁer Systems for Road Trafﬁc Congestion Detection

Matthias Sommer and J

org H

ahner

Organic Computing Group, University of Augsburg, Eichleitner Str. 30, 86159 Augsburg, Germany

Keywords:

Road Trafﬁc, Congestion Detection, Learning Classiﬁer Systems, Machine Learning, Reinforcement Learn-

ing.

Abstract:

The increase in mobility leads to a higher number of kilometres driven per vehicle and more delay due to con-

gestion which poses a recent and future problem. Congestion generates growing environmental pollution and

more car accidents. We apply machine learning concepts to the task of congestion detection in road trafﬁc. We

focus on the extended classiﬁer system XCSR, an evolutionary rule-based on-line learning classiﬁer system.

Experiments with real-world detector data demonstrate high accuracy of XCSR for congestion detection on

interstates.

1 INTRODUCTION

According to several reports (Schrank et al., 2012;

Lenz et al., 2010), the number of kilometres driven

and the delay due to congestion increased over the

last decades and this trend is assumed to last. Rais-

ing trafﬁc volumes promote growing air pollution, a

greater number of car accidents, and more trafﬁc con-

gestion. Intelligent incident management systems try

to mitigate the negative effects of congestion. This

includes the collection of sensor data, the detection

or prediction of congestion, and the execution of ac-

tions for the congestion management. The detection

process is often performed by automatic incident de-

tection (AID) algorithms, e.g. by processing video

image material from trafﬁc surveillance cameras or

by pattern recognition on sensor data. These classic

approaches work well under certain conditions, but

their performance is often strongly dependent on pre-

deﬁned thresholds and they are not able to adapt to

new and previously unknown patterns at runtime. In

this work, we apply machine learning concepts to the

task of congestion detection on interstates. Learn-

ing classiﬁer systems (LCS), such as the extended

classiﬁer system XCSR, resemble evolutionary rule-

based machine learning techniques that have shown to

work well for classiﬁcation tasks (Bull, 2004). XCSR

evolves new rules at runtime with the help of a ge-

netic algorithm (GA), while also improving its accu-

racy over time via reinforcement of the existing rule

set. Experiments with real-world detector data were

carried out to investigate the performance of XCSR

under real-world conditions. Support vector machines

(SVM) have proven to be accurate classiﬁers for traf-

ﬁc congestion detection (Diamantopoulos et al., 2014;

Singliar and Hauskrecht, 2006). Consequently, we

compare our approach to several representatives.

The remainder of this work is structured as fol-

lows. First, we provide a brief overview of the related

work in this ﬁeld. We move on, mapping the formal

concept of congestion detection to machine learning

problems. We introduce the reader to the fundamen-

tals of learning classiﬁer systems in general, and the

XCSR in particular. Based on the theoretical concept,

we present how LCSs can be practically used to tackle

the congestion detection problem. Another machine

learning concept, in particular SVMs, are consulted

as a reference solution for the later evaluation. We

conclude this work with a summary of our ﬁndings

and an outlook on future work.

2 RELATED WORK

Incident detection is one of many components of

advanced trafﬁc management systems. (Ozbay and

Kachroo, 1999) deﬁne it as “the process of identifying

the spatial and temporal coordinates of an incident”.

It is executed by automatic algorithms or by manual

evaluation. Reliable detection mechanisms and fast

clearance are important for mitigating the negative ef-

fects of incidents and congestion. Figure 1 depicts

the typical ﬂow of the incident management process

(Deniz et al., 2012).

First, data from surveillance systems (e.g. CCTV

142

Sommer, M. and Hähner, J.

Learning Classiﬁer Systems for Road Trafﬁc Congestion Detection.

DOI: 10.5220/0006214101420150

In Proceedings of the 3rd International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2017), pages 142-150

ISBN: 978-989-758-242-4

Data

collection

VerificationDetection

Response

Information

dissemination

Clearance

Incident

No incident

Figure 1: Typical ﬂow chart of an AID system. The veriﬁ-

cation step is optional.

cameras or loop detectors) provides a situation de-

scription of the current trafﬁc condition. Second, this

data is usually sent to a central control centre where

it is processed. The data analysis is often executed by

automated incident detection algorithms. An incident

is described by its type (recurrent, blockage, etc.), its

exact location, its severity, and the time of occurrence.

Third, the incident alarm can be veriﬁed by an oper-

ator, e.g. via surveillance cameras. Fourth, the edited

information has to be disseminated among the trafﬁc

participants. The congestion management is usually

done by trafﬁc experts. Its strategies range from adap-

tation of signal plans, to re-routing of trafﬁc by means

of route recommendations via variable message signs,

and radio broadcasts. Finally, clearance procedures

are initiated to restore the undisturbed conditions as

before the incident. In this work, we focus on the sec-

ond step, presenting how learning classiﬁer systems

can be used to analyse trafﬁc data to determine the

presence of congestion.

Comprehensive reviews over congestion detec-

tion algorithms and detector technology are given by

(Parkany and Xie, 2005) and (Mahmassani et al.,

1999). The family of point-based algorithms is usu-

ally deployed on freeways (Yang et al., 2004). It

can be separated into comparative algorithms, statis-

tical processing, trafﬁc modeling and theoretical al-

gorithms, and advanced machine learning algorithms.

Spatial measurement-based algorithms make use of

CCTV cameras and image processing algorithms and

are also used in urban trafﬁc networks (Zhang and

Xue, 2010). Congestion patterns are detected based

on temporal and spatial differences of trafﬁc param-

eters monitored by trafﬁc sensors. The performance

of many of these algorithms is strongly dependent on

thresholds set during design time by trafﬁc experts

based on historic data. Furthermore, they are prone

to obsolescence in case of changing trafﬁc demands

and are not able to learn new behaviour. In contrast,

learning classiﬁer systems evolve their knowledge at

runtime, being able to adapt to a changing environ-

ment by learning new rules. Additionally, they can be

trained upfront based on labelled training data.

3 CONGESTION DETECTION AS

MACHINE LEARNING TASK

In its simplest form, congestion detection depicts a

binary classiﬁcation problem. In this case the two

classes represent the presence or absence of conges-

tion. Typically, the classes are imbalanced, mean-

ing that the class representing free-ﬂowing trafﬁc has

much more instances than the congested class. This

imbalance and the resulting lack of instances makes

the learning process more difﬁcult. This two-class

problem can be expressed more formally as

f (~x) → c

∈ {i = 1,2} (1)

A feature vector ~x = {x

,..., x

} is processed by

a function f which maps the input variables to a spe-

ciﬁc class c

. In case of congestion detection, ~x con-

tains a deﬁned set of trafﬁc parameters describing the

current trafﬁc conditions. The goal is to ﬁt a model

that relates the observations in ~x to the correct class

label c

. Machine learning techniques use artiﬁcial

intelligence to deduce a process, model, or function

from observations to describe a certain behaviour (Al-

paydın, 2008). A number of authors applied differ-

ent machine learning algorithms to the problem of

congestion detection, such as SVMs (Diamantopou-

los et al., 2014;

Singliar and Hauskrecht, 2006), ar-

tiﬁcial neural networks (Srinivasan et al., 2004), and

fuzzy logic algorithms (Brumback, 2009). Instead of

just relying on one single technique, some researchers

combine several methods, e.g. (Liu et al., 2014) use

multiple na

ıve bayes classiﬁers. Some of these algo-

rithms are able to learn new patterns and to improve

their model at runtime (reinforcement learning), e.g.

learning classiﬁer systems. These algorithms choose

and execute actions in reaction to the observations and

adjust their parameters, and their internal and external

model according to the feedback received.

4 LEARNING CLASSIFIER

SYSTEMS IN A NUTSHELL

The learning classiﬁer system (LCS) (Bull and Ko-

vacs, 2005) is founding on the Holland’s initial frame-

work for classiﬁer systems (Holland, 1986), resem-

bling an evolutionary on-line machine learning tech-

nique that is designed for both single-step and multi-

step problems. LCS combines ideas from evolution-

ary computing, reinforcement learning, supervised or

unsupervised learning, and heuristics. This adaptive,

rule-based system builds a descriptive model for the

underlying observations. The knowledge base con-

sists of a population of rules (or classiﬁers) which

Learning Classiﬁer Systems for Road Trafﬁc Congestion Detection

143

map situation descriptions to actions. An evolution-

ary algorithm evolves these rules in order to explore

the problem space. The existing rules are rated upon

their inﬂuence on the system or task. Therefore, rules

that have shown to achieve good results have a higher

possibility to be chosen again in later executions.

In 2000, Wilson proposed the eXtended Classi-

ﬁer System for Real-valued inputs (XCSR) (Wilson,

2000). Its advantage over the traditional LCS is its

ability to take real-valued inputs which are usually

found in real-world environments. XCSR tries to

evolve accurate and maximally general classiﬁers that

cover the state-action space of the underlying prob-

lem. A single XCSR classiﬁer cl

comprises a couple

of attributes: 1) The condition C

that determines a

certain subspace of the problem space by encoding a

geometric structure, 2) an action a that deﬁnes a reac-

tion that can be executed on the environment, 3) the

payoff prediction p

estimating the payoff of the sys-

tem in case C

matches the current situation and its

action was chosen, 4) an error estimate ε that reﬂects

the mean absolute prediction error, and 5) a ﬁtness

value φ that can be roughly interpreted as an inverse

of ε, which represents the accuracy of the prediction.

The learning process is as follows. First, the clas-

siﬁer system receives an input~x, representing the cur-

rent environmental state. Second, based on this situa-

tion description, classiﬁers matching ~x are selected.

Because different classiﬁers can represent different

actions, an action selection process has to be exe-

cuted. Within our scenario, an action resembles the

class prediction for the current trafﬁc condition. In

case a classiﬁer cl was chosen and the execution of its

according action resulted in a positive inﬂuence of the

environment, cl gains a positive reward. Otherwise,

the rating of this classiﬁer is reduced. The reinforce-

ment process leads to better system performance over

time. Additionally, the problem space is explored by

creating new classiﬁers for previously unknown situ-

ations at runtime.

5 XCSR FOR CONGESTION

DETECTION

Classiﬁer systems have been successfully applied in

a variety of real-world applications (Bull, 2004). To

the best of the authors’ knowledge, the distributed,

adaptive control of signalisation is their only applica-

tion within the trafﬁc domain (Bull et al., 2004; Pro-

thmann et al., 2008). In the following, we explain

in detail how we adapt XCSR to the classiﬁcation of

trafﬁc conditions. In the context of AID, the situation

description is a vector ~x = (x

,. .., x

) of continuous,

real-valued trafﬁc parameters monitored by sensors,

e.g. loop detectors or CCTV cameras. Therefore, ~x

represents the current trafﬁc condition at an intersec-

tion or section. The action a is the estimated con-

gestion classiﬁcation (0 for congested and 1 for free-

ﬂowing). The underlying task is mapped to a single-

step problem, as the according reward (the actual state

of trafﬁc) for a is returned in the next time step.

5.1 The Main Loop

The classiﬁcation process for the current trafﬁc con-

dition works as follows. In every time step t, values

monitored by a trafﬁc detector are retrieved. We sim-

ulate the sensor input by reading the next line from a

data set. This input is then converted into a feature

vector

. In case we do not want to use every avail-

able sensor value from

, we deﬁne which features to

include and which ones to omit (we may only be in-

terested in the average speed and occupancy). XCSR

demands the input values to be in the range of [0;1[,

thus all components of

have to be normalised to

this value range. This is no limitation to its applica-

tion since the upper limits for the trafﬁc parameters,

such as occupancy or speed, of a given road can be es-

timated. The resulting vector

is then given to XCSR

for classiﬁcation.

First, the condition of each classiﬁer cl of the pop-

ulation [P] is compared to the current input

. In case

cl matches the external input, it is added to the match

set [M]. If [M] consists of too less classiﬁers (com-

pared to a predeﬁned threshold), the GA is triggered

to create new, random classiﬁers matching

. This

covering process is executed to cover previously un-

known situations and to enable XCSR to offer a pre-

diction for the current situation. The newly created

classiﬁers are initialised according to pre-deﬁned val-

ues for the prediction p, the prediction error ε, and

the initial ﬁtness F. Afterwards, a ﬁtness-weighted

average P

of the predictions p

for each action a

represented in [M] is computed as

= (

∑

)/(

∑

) (2)

, the system prediction of action a, is added to the

prediction array [P]. Here, [P] consist of two entries,

one for each possible classiﬁcation. Then, an action

is chosen from [P] based on the action selection

regime. This selection can either be random (Explo-

ration), probabilistic, or deterministic (Exploitation).

Finally, the action set [A] consists of the subset of clas-

siﬁers of [M] having the chosen action. After execut-

ing a, the prediction, prediction error, and ﬁtness of

each classiﬁer in [A] are updated based on the received

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

144

reward (Reinforcement). In our scenario, the reward is

either 1000 for a correct classiﬁcation or 0 for a false

classiﬁcation.

5.2 Training and Rule Discovery

XCSR is an on-line learning algorithm. For a valid

comparison with other algorithms, we simulate an

off-line training phase before testing. During train-

ing, XCSR explores the situation space. Afterwards,

the gained knowledge is exploited. The training is su-

pervised, thus, each situation vector of the training set

has to be classiﬁed into one of two classes, congested

or free ﬂowing. The training examples are given in the

form {(x

)} such that x

is the feature vector and y

its class label. Amongst other factors, the exploration

rate strongly depends on the execution rate of the GA

and the chosen action selection regime.

In certain intervals, deﬁned by a parameter θ

XCSR tries to explore the search space by creating

new rules. This discovery process is executed by the

GA. Two classiﬁers are chosen probabilistically from

the latest action set based on their ﬁtness. An off-

spring is created by crossing the two parents based

on a two-point crossover of their conditions, and then

mutating the condition and action with a certain prob-

ability. A classiﬁer is mutated by adding to or sub-

tracting a random offset from its condition representa-

tion. The prediction is set to the mean of the parents’

prediction values. The ﬁtness and the prediction error

are also set to the mean of the parents values, multi-

plied with reduction factors α, respectively p

red

. As a

result, the two new offspring classiﬁers are added to

the population.

Usually, the action selection method during the explo-

ration phase is random. For faster convergence, we

utilise a ﬁtness-proportionate action selection, also

known as roulette wheel selection. After the execu-

tion of the chosen action, its reward is returned, and a

reinforcement of the selected classiﬁers takes place.

5.3 Testing

After completing the off-line training phase, XCSR

relies on its previously learned knowledge. For the

on-line application, we switch the action selection

regime to a deterministic best-action selection. The

best action is represented by the classiﬁer with the

highest ﬁtness-weighted score in [P]. The GA is no

longer executed in certain intervals. However, cov-

ering is still executed in case of unknown situations

or missing actions in [M]. We always want at least

one classiﬁer in [M] representing one of the two traf-

ﬁc classiﬁcations. Thereby, new classiﬁers can still be

created and added to the population.

5.4 Parameter Study

The commonly used settings for the learning pa-

rameters of XCS is given by (Butz and Wilson,

2002). Starting with these initial parameter settings, a

small parameter study for the most important learn-

ing parameters was conducted: β = {0.1,0.2,0, 5},

= {1, 5,15,25, 50}, s

= {0.1, 0.2,0.5}, m

{0.1,0.2}, m

= {0.1,0.2,0.4}, r

= {0.1,0.2,0.4},

= {0.2, 0.3,0.5}, and p

= {0.04, 0.05,0.06}.

The best performance was achieved with the follow-

ing settings (see Table 1): β = 0.2, θ

= 5, p

= 0.3,

= 0.05, and the unordered bound representation

with m

= 0.2 and r

= 0.2. An increased learning

rate beta allows the system to adjust classiﬁers faster

but makes it more sensitive to temporary peaks. A de-

crease in the number of executions of the GA leads to

raising variance of the results.

Table 1: Initial parameter settings for the most important

learning parameters of XCSR.

N Max. number of micro-classiﬁers 800

β Learning rate for p, ε, and φ 0.2

init

Initial classiﬁer ﬁtness 0.01

init

Initial classiﬁer prediction error 0.0

init

Initial classiﬁer prediction value 10.0

δ Classiﬁer ﬁtness deletion threshold 0.1

Classiﬁer accuracy threshold 10

sub

Classiﬁer subsumption threshold 20

del

Class. experience deletion threshold 20

GA application interval 5

Crossover probability 0.3

Mutation probability 0.05

α Fitness reduction factor 0.1

red

Prediction reduction factor 0.25

Centre spread factor 0.2

Mutation prob. for centre spread 0.1

Mutation prob. for (un)ordered bound 0.2

Covering prob. for (un)ordered bound 0.5

A visualisation of the state space helps to estimate

the complexity of the underlying problem. Figure 2

depicts the dependency between volume and speed

(Figure 2(a)), and speed and occupancy (Figure 2(b))

for a random day. The black line exemplary visu-

alises a possible linear separation between states that

are categorised as congested or not congested. The

separation between congested and free-ﬂowing trafﬁc

conditions is rather clear for most situations. Con-

gested conditions can be assumed in case the average

speed falls below a certain threshold while the occu-

pancy or the number of vehicles increases.

Learning Classiﬁer Systems for Road Trafﬁc Congestion Detection

145

●

●●

●

●● ●

●

● ●

●

●●

●

●●

●

● ●

●

●●

●

●●

●

● ●

●

●●

●

●●

●

500

1000

1500

20 40 60

Average speed (mph)

Count (vehicles)

(a) Volume and speed.

●

20 40 60

Average speed (mph)

Occupancy (%)

(b) Speed and occupancy.

Figure 2: Scatter plots showing the state space for different

trafﬁc variables measured at highway I35E, detector 2447

(red squares depict congested, blue circles free ﬂowing con-

ditions).

Further performance can be gained by adjusting

the number of trafﬁc parameters within the feature

vector. On the one hand, more features can describe

the underlying dynamics of trafﬁc more precisely. On

the other hand, a higher number of features expands

the search space drastically. This leads to longer ex-

ploration durations while needing more classiﬁers to

describe the respective feature space.

6 EVALUATION

6.1 Experimental Setup: Trafﬁc Data

The evaluation was done with ten real-world data sets

provided by the Minnesota Department of Transporta-

tion (MDoT)

. The data was recorded by inductive

loop detectors in the vicinity of Minneapolis (Fig-

ure 3), averaged over ﬁve minute intervals, resulting

in 2016 data points per week. Each data point con-

tains the time of recording, average speed, volume,

occupancy, and density. The congestion labels were

annotated by hand whereas a sudden speed drop and

raise in occupancy indicated the presence of conges-

tion.

I35E

TH5

I94

Figure 3: Locations of the selected detector stations in Min-

neapolis, U.S.

http://dot.state.mn.us/tmc/trafﬁcinfo/developers.html

The monitoring locations and dates are taken ran-

domly: Interstate I35E, June/July 2013 (detectors

2442, 2443, 2444, 2447, 2448); Interstate TH5, De-

cember 2015/January 2016 (detector 1577); and In-

terstate I94, May/June 2015 (detectors 569, 365, 366,

367). Trafﬁc on I35E and I94 shows some typical

seasonal behaviour. Congestion usually occurs dur-

ing rush hours in the morning and evening on work

days. In contrast, trafﬁc on TH5 exhibits stop-and-go

behaviour in the early hours during work days. MDoT

deﬁnes congestion as trafﬁc ﬂowing at speeds below

45 miles per hour. Our data sets exhibit congested

conditions between 2% and 29% of the time. This

class imbalance (between congested and free-ﬂowing

data points) is commonly found in data from real-

world environments. Each data set was split into three

weeks for training (6048 data points) and one week

for testing (2016 data points).

Speed (mph)

Time (hour of day)

Faulty measurements

Congestion

Figure 4: Trafﬁc ﬂow proﬁle for arterial I35E, detector sta-

tion 2447, 2013-06-12.

Figure 4 shows a representative trafﬁc ﬂow proﬁle

on I35E, measured by detector 2447 on Wednesday,

2013-06-12. Usually, the monitored speed ﬂuctuates

around the recommended limit of 60 mph. The plot

depicts a typical weekday morning rush hour from

7.30 a.m. to 9.30 a.m. where the average speed is

reduced to 20 mph. The detector station recorded

faulty measurements from 7 p.m. to midnight, a typ-

ical problem of real-world data from inductive loop

detectors (Parkany and Xie, 2005). We did not re-

move faulty measurements to evaluate how XCSR

and SVM deal with this problem.

6.2 Support Vector Machines for

Congestion Detection

Support vector machines (SVM) (Ben-Hur and We-

ston, 2010) have proven to provide good generalisa-

tion and convergence for classiﬁcation tasks. Their

goal is to approximate the optimal hyperplane, ac-

curately separating the state space into two distinct

classes. Their performance is dependent on a coefﬁ-

cient C deﬁning the margin between classes and the

kernel hyper-parameter γ of the gaussian kernel han-

dling the non-linear classiﬁcation. Large values for γ

and C give a low bias and high variance because the

cost of misclassiﬁcation gets stronger penalized. In

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

146

contrast, small values result in higher bias and lower

variance.

The following SVM implementations from the

JKernelMachines framework (Picard et al., 2013)

were chosen as references for the evaluation: LaSVM,

LaSVM-I, and SDCA. LaSVM (Bordes et al., 2005)

is an efﬁcient SVM solver that uses on-line approx-

imation. It is able to handle noisy data sets using

less memory than other state-of-the-art SVM solvers.

LaSVM-I (Ertekin et al., 2011) is an optimisation

of LaSVM. It ﬁlters outliers based on approximat-

ing non-convex behaviour in convex optimisation.

LaSVM-I proves to be faster in terms of training

times, needing fewer support vectors, and offering

only slightly worse accuracy. Stochastic dual coor-

dinate ascent (SDCA) (Shalev-Shwartz and Zhang,

2013) is a method for solving large-scale supervised

learning problems formulated as minimisation of con-

vex loss functions. It executes iterative, random coor-

dinate updates.

We conducted a small parameter study for C, γ,

and the type of kernel. The gaussian chi-squared ker-

nel wa chosen as it showed to be faster than the gaus-

sian kernel with L2 distance, providing similar accu-

racy. It is calculated as

k(x, y) = 1 −

∑

i=1

− y

)

+ y

)

(3)

Test runs with C = {1,10,100, 500,1000} and γ =

{0.01,0.1, 0.5,1} indicated that C = 10 and γ = 0.01

yield good results without overﬁtting the model.

6.3 Experimental Results

All following experimental results are average over

ten runs, executed on the previously introduced data

sets. According to (Parkany and Xie, 2005), speed

and occupancy are chosen about 80% of the time in

terms of congestion detection in trafﬁc management

centres in the U.S. Following this advice, we present

the experimental results for the feature vector com-

prising these two parameters.

The choice between several approaches is often

dependent on multiple factors. One aspect is their

performance on historical data which we evaluate and

discuss in the following section. Other factors include

the runtime, the interpretability by humans, and the

convenience to conﬁgure the respective technique.

Runtime. Table 2 shows the mean runtime and

standard deviation averaged over ten execution runs.

The evaluation was done on an Intel i7 dual-core with

2.6 GHz and 8 GB RAM. The SVM variants are used

as is from the JKernelMachines framework (Picard

Table 2: Average runtime (and standard deviation) in sec-

onds for the training phase and the test phase.

Method Training Testing

LaSVM 14.2 (10.3) 0.8 (0.3)

LaSVM-I 9.5 (14.3) 1.7 (1.5)

SDCA 5.6 (0.5) 7.6 (0.2)

XCSR 2.6 (0.4) 0.5 (0.2)

et al., 2013). Considering execution times, XCSR

has a clear beneﬁt over SVM, having much lower run-

time, both for training and testing (each data set has

4032 data points). To speed up the training process

of the SVM variants, the number of training epochs

E can be reduced. The following speed-up can be

achieved by reducing E from ﬁve (default value) to

one: LaSVM (8.4 sec.), LaSVM-I (2.6 sec.), SDCA

(2.0 sec.). However, these results have to be inter-

preted with caution. Each data point of the data set

was given one-by-one to XCSR (on-line learning),

whereas the SVMs were given the training set as a

whole, speeding up the learning process drastically

as the model is computed only once. In fact, if the

SVMs are trained using on-line learning, adjusting

the internal model after every time step, the execution

times are signiﬁcantly longer, e.g. LaSVM-I needs

12.5 minutes and SDCA 5.5 minutes for a single train-

ing run (4032 data points, E = 5).

Conﬁguration. Mostly, XCSR offers fairly good

performance out-off-the-box using its standard pa-

rameter settings. Still, a fair amount of parameter

studying is needed to ﬁnd the optimal settings. In this

aspect, SVM is quite simple to conﬁgure since it only

requires hyper-parameters C and γ and the number of

epochs E to be set, as well as the kernel to be chosen.

Interpretability. Another aspect is the understand-

ability of the model. SVMs resemble a very ﬂexible

method. Still, their interpretability is very low as the

support vectors are difﬁcult to analyse or to visualise

(James et al., 2013). XCS is designed to be inter-

pretable by humans, while still being ﬂexible. Clas-

siﬁers can be added and adapted during runtime, and

their respective values, action, and condition are eas-

ily understandable.

Number of Classes. We formulate congestion de-

tection as a binary classiﬁcation problem. Consider-

ing XCSR, increasing the number of classes (e.g. ten-

tative congestion or faulty detector data) is no prob-

lem as only the number of distinct classiﬁer actions

has to be adjusted. The complexity of XCSR’s imple-

mentation stays the same. In contrast, SVMs are usu-

Learning Classiﬁer Systems for Road Trafﬁc Congestion Detection

147

ally two-class classiﬁers. For multi-class tasks, de-

composition methods such as one-against-all or one-

against-one are used. Solving a multi-class SVM in

one step results in a much larger optimisation prob-

lem (Hsu and Lin, 2002).

6.3.1 Measuring Classiﬁer Accuracy

Given a labelled data set, the following four basic

measures are usually used for statistical analysis of

classiﬁers:

• True positives (TP): Number of correct results

where roads are predicted to be congested.

• True negatives (TN): Number of correct results

where roads are classiﬁed as free ﬂowing and

there is actually no congestion.

• False positives (FP): Number of falsely predicted

congested roads, whereas trafﬁc ﬂows freely.

• False negatives (FN): Number of falsely predicted

free ﬂowing roads, whereas the road is actually

congested.

In other words, TP and TN describe the accuracy of

the classiﬁer (the predicted class label matches the ac-

tual classiﬁcation). FP and FN measure the error rate

of the evaluated classiﬁer. Naturally, high detection

rates and a minimal number of false alarms is desired.

However, these two performance measures are not in-

dependent. The number of false alarms can easily

be reduced by decreasing the sensitivity of the detec-

tion algorithm. Still, this will result in poor detection

rates. In contrast, increasing the detection rate DR

(DR = T P/(T P + FN)) will also increase the false

alarm rate FAR (FAR = FP/(T N + FP)).

As shown in Table 3, XCSR has a low FAR of

0.26% and a high DR of 95.5%. The average num-

ber of FP and FN is rather low. The FAR and

DR of the SVM variants are as follows: LaSVM

(FAR = 0.21%, DR = 90.0%), LaSVM-I (FAR =

0.43%, DR = 74.3%), and SDCA (FAR = 1.4%,

DR = 91.5%).

The following metrics are described in terms of

TP, TN, FN and FP. The accuracy A speciﬁes the num-

ber of correct results as

A =

T P + T N

(4)

where N is the total number of classiﬁed situations.

A classiﬁer who simply classiﬁes all situations as free

ﬂowing achieves high accuracy since the probability

that trafﬁc is congested is generally much lower than

free ﬂowing trafﬁc. The precision P is calculated as

P =

T P

T P + FP

(5)

An algorithm who predicts few or no congestion may

result in high precision since the number of FP is

minimised. In general, high precision means that the

classiﬁer returns more correct than wrong predictions.

The recall R measures the proportion of positives that

are correctly identiﬁed by

R =

T P

T P + FN

(6)

High recall can easily achieved by classifying all sit-

uations as congested. The F-measure considers both

recall R and precision P. It is calculated as

F =

2PR

P + R

(7)

Finally, the speciﬁcity SP measures the proportion of

negatives that are correctly identiﬁed by

SP =

T N

FP + T N

(8)

Figure 5 presents the results for these measures. The

box plots show the statistical distribution of the av-

erage classiﬁcation accuracy. The bottom and top

of the box represent the ﬁrst and third quartiles, and

the band inside the box represents the median. Out-

liers are indicated by separate points. In general, all

approaches had very high accuracy (an average of

97% and above), classifying most of the congested

situations as congested, and most of the not con-

gested situations as free ﬂowing. Figure 5(d) indicates

that LaSVM-I misclassiﬁed too many situations as

congested. In general, LaSVM-I performed slightly

worse compared to the other algorithms. Most of the

outliers are caused by the TH5 data set which has rel-

atively few congested situations. However, all ma-

chine learning techniques seem to struggle in learning

to differentiate its feature space, due to the lack of in-

stances belonging to the congested class. Although,

XCSR has its lowest values for recall (0.84), preci-

sion (0.86), and the F-measure (0.85), it still offers a

fair performance for the TH5 data set. LaSVM and

LaSVM-I were not able to learn the task for this data

set as they simply classify all situations falsely as not

congested. On average, XCSR has better results for

accuracy, recall, and F-measure than the SVMs, offer-

ing similar performance for precision and speciﬁcity.

6.3.2 Learning Behaviour of XCSR

In the following, we evaluate the learning behaviour

of XCSR. We measure the system error, the popula-

tion size, and the fraction of correct classiﬁcations in

every execution. Figure 6 shows how XCSR is able to

improve its performance over time. The vertical dot-

ted line marks the end of the training phase after 6000

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

148

Table 3: Confusion matrix reporting the average number (and standard deviation σ) of true positives (TP), false positives (FP),

false negatives (FN), and true negatives (TN) of XCSR for the ten test data sets.

Actual class

Congested Free ﬂowing Total

Prediction

Congested 248 (σ = 170) (TP) 5 (σ = 5) (FP) 253

Free ﬂowing 12 (σ = 8) (FN) 1751 (σ = 170) (TN) 1763

Total 260 1756 2016

●

0.92 0.94 0.96 0.98 1.00

XCSR LaSVM LaSVM−I SDCA

Accuracy

(a) Accuracy.

● ●●

●

0.0 0.2 0.4 0.6 0.8 1.0

XCSR LaSVM LaSVM−I SDCA

Sensitivity

(b) Recall.

●

● ●●

●

0.0 0.2 0.4 0.6 0.8 1.0

XCSR LaSVM LaSVM−I SDCA

F−measure

●

● ●●

●

0.0 0.2 0.4 0.6 0.8 1.0

XCSR LaSVM LaSVM−I SDCA

Precision

(d) Precision.

●

0.92 0.94 0.96 0.98 1.00

XCSR LaSVM LaSVM−I SDCA

Specificity

(e) Speciﬁcity.

Figure 5: The box plots show the statistical distribution of the average classiﬁcation accuracy for XCSR, LASVM, LASVM-I,

and SDCA (left to right) considering occupancy and speed.

time steps. The points of each curve are the fraction

of the last 50 executions, averaged over ten runs. The

fraction correct is the percentage of correct classiﬁ-

cations in the last 50 executions. The system error

is calculated as the absolute difference between the

actual reward and the system prediction P

of the se-

lected action, divided by the maximal reward (1000).

The population curve shows the average number of

micro classiﬁers normalised to the range of [0;1]. The

population size was roughly between 3400 and 5400.

Accordingly, we chose 6000 as the maximum number

of micro classiﬁers within the population. The curve

shows how the number of classiﬁers continuously in-

creases during the training phase. XCSR applied cov-

ering between 15 and 35 times (average: 24.4). The

number of GA executions ranges from 325 to 777 (av-

erage: 494.5), which translates to one GA execution

every 12th step during the training phase. Due to the

explorative behaviour of XCSR during training, the

fraction of correct classiﬁed instances and the system

error ﬂuctuate more during this phase.

7 CONCLUSION

We applied the extended classiﬁer system XCSR to

the task of detection of congestion patterns on inter-

states. The evaluation was done with real-world data

monitored by inductive loop detectors located in Min-

neapolis. In conclusion, it can be noted that XCSR

is able to evolve accurate classiﬁers, offering reliable

accuracy for the classiﬁcation of trafﬁc conditions.

Compared to three different types of support vector

machines, XCSR offers competitive performance.

0.00

0.25

0.50

0.75

1.00

0 2000 4000 6000 8000

Time step

Fraction correct, system error, population size (10 runs)

Fraction correct

Population size / 6000

System error / max. payoff

Figure 6: Average fraction correct, system error, and popu-

lation size for XCSR (feature vector: occupancy and speed).

Furthermore, XCSR is signiﬁcantly faster in terms of

runtime for training and testing. In contrast to the

representation of support vectors, XCSR’s rule base

of classiﬁers of is easily interpretable by humans. We

plan to extend the binary classiﬁcation problem by in-

troducing additional classes, such as tentative conges-

tion or continuing congestion. Instead of classifying

the current situation, XCSR can be adapted to pre-

dict the upcoming trafﬁc conditions. Furthermore, we

want to investigate the performance of XCSR for ur-

ban congestion detection at intersections and sections,

following ideas from (Klejnowski, 2008).

Learning Classiﬁer Systems for Road Trafﬁc Congestion Detection

149

ACKNOWLEDGEMENT

The authors would like to thank Roman Sraj for his

contribution in the scope of his bachelor’s thesis.

REFERENCES

Alpaydın, E. (2008). Maschinelles Lernen. Oldenbourg.

Ben-Hur, A. and Weston, J. (2010). Data Mining Tech-

niques for the Life Sciences, chapter A User’s Guide

to Support Vector Machines, pages 223–239. Humana

Press.

Bordes, A., Ertekin, S., Weston, J., and Bottou, L. (2005).

Fast kernel classiﬁers with online and active learning.

Journal of Machine Learning Research, 6:1579–1619.

Brumback, T. E. (2009). A Mathematical Model for

Freeway Incident Detection and Characterization: A

Fuzzy Approach. PhD thesis, University of Alabama.

Bull, L., editor (2004). Applications of Learning Classiﬁer

Systems. Springer.

Bull, L. and Kovacs, A. (2005). Foundations of Learn-

ing Classiﬁer Systems, volume 183, chapter Founda-

tions of Learning Classiﬁer Systems: An Introduction,

pages 1–17. Springer.

Bull, L., Sha’Aban, J., Tomlinson, A., Addison, J. D., and

Heydecker, B. (2004). Towards distributed adaptive

control for road trafﬁc junction signals using learning

classiﬁer systems. In Applications of Learning Clas-

siﬁer Systems, pages 276–299. Springer.

Butz, M. and Wilson, S. (2002). An algorithmic description

of XCS. Soft Computing - A Fusion of Foundations,

Methodologies and Applications, 6:144 – 153.

Deniz, O., Celikoglu, H. B., and Gurcanli, G. E. (2012).

Overview to some incident detection algorithms: A

comparative evaluation with istanbul freeway data.

Proc. of 12th Int. Conf. Reliability and Statistics in

Transportation and Communication, pages 274–284.

Diamantopoulos, T., Kehagias, D., Knig, F. G., and Tzo-

varas, D. (2014). Use of density-based cluster anal-

ysis and classiﬁcation techniques for trafﬁc conges-

tion prediction and visualisation. In Transp. Research

Arena Proceedings.

Ertekin, S., Bottou, L., and Giles, C. (2011). Nonconvex on-

line support vector machines. IEEE Trans. on Pattern

Analysis and Machine Intelligence, 33(2):368–381.

Holland, J. H. (1986). A mathematical framework for

studying learning in classiﬁer systems. Phys. D, 2(1-

3):307–317.

Hsu, C.-W. and Lin, C.-J. (2002). A comparison of methods

for multiclass support vector machines. Trans. Neural

Networks, 13(2):415–425.

James, G., Witten, D., Hastie, T. J., and Tibshirani, R. J.

(2013). An Introduction to Statistical Learning.

Springer.

Klejnowski, L. (2008). Design and implementation of an

algorithm for the detection of disturbances in trafﬁc

networks. Master’s thesis, University of Hannover,

Institute for Systems Engineering.

Lenz, B., Nobis, C., K

ohler, K., Mehlin, M., Follmer,

R., Gruschwitz, D., Jesske, B., and Quandt, S.

(2010). Mobilit

at in Deutschland 2008. DLR-

Forschungsbericht.

Liu, Q., Lu, J., Chen, S., and Zhao, K. (2014). Multiple

ıve Bayes classiﬁers ensemble for trafﬁc incident

detection. Mathematical Problems in Engineering,

2014.

Mahmassani, H. S., Haas, C., Zhou, S., and Peterman, J.

(1999). Evaluation of incident detection methodolo-

gies. Technical report, Center for Transportation Re-

search, University of Texas.

Ozbay, K. and Kachroo, P. (1999). Incident Management in

Intelligent Transportation Systems. Artech House ITS

library. Artech House.

Parkany, E. and Xie, P. C. (2005). A complete review of in-

cident detection algorithms & their deployment: What

works and what doesn’t. Technical report, Fall River,

MA: New England Transp. Consortium.

Picard, D., Thome, N., and Cord, M. (2013). Jkernelma-

chines: a simple framework for kernel machine. Jour-

nal of Machine Learning Research, 14(1):1417–1421.

Prothmann, H., Rochner, F., Tomforde, S., Branke, J.,

uller-Schloer, C., and Schmeck, H. (2008). Organic

control of trafﬁc lights. In Proc. of the 5th Int. Conf.

on Autonomic and Trusted Computing, volume 5060

of LNCS, pages 219–233. Springer.

Schrank, D., Eisele, B., and Lomax, T. (2012). 2012 Ur-

ban mobility report. Texas A&M Transp. Institute.

http://mobility.tamu.edu/ums/.

Shalev-Shwartz, S. and Zhang, T. (2013). Stochastic dual

coordinate ascent methods for regularized loss. Jour-

nal of Machine Learning Research, 14(1):567–599.

Singliar, T. and Hauskrecht, M. (2006). Towards a learning

incident detection system. In Proc. of the Workshop

on Machine Learning for Surveillance and Event De-

tection at the 23

Int. Conf. on Machine Learning.

Srinivasan, D., Jin, X., and Cheu, R. L. (2004). Evalua-

tion of adaptive neural network models for freeway

incident detection. In IEEE Transaction on Intelligent

Transp. Systems, volume 5, pages 1–11.

Wilson, S. (2000). Get real! xcs with continuous-valued

inputs. In Learning Classiﬁer Systems, volume 1813

of LNCS, pages 209–219. Springer.

Yang, X., Sun, Z., and Sun, Y. (2004). A freeway traf-

ﬁc incident detection algorithm based on neural net-

works. In Yin, F.-L., Wang, J., and Guo, C., editors,

Advances in Neural Networks - ISNN 2004, volume

3174 of LNCS, pages 912–919. Springer.

Zhang, K. and Xue, G. (2010). A real-time urban trafﬁc de-

tection algorithm based on spatio-temporal OD matrix

in vehicular sensor network. Wireless Sensor Network,

2(9):668–674.

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

150