Short-Term Traffic Prediction under Both Typical and Atypical

Traffic Conditions using a Pattern Transition Model

Traianos-Ioannis Theodorou

1,2

, Athanasios Salamanis

1

, Dionysios D. Kehagias

1

, Dimitrios Tzovaras

1

and Christos Tjortjis

2

1

Information Technologies Institute, Centre for Research & Technology Hellas, GR 57001 Thermi-Thessaloniki, Greece

2

International Hellenic University, GR 57001 Thermi-Thessaloniki, Greece

Keywords: Short-Term Traffic Prediction, Atypical Conditions, Automatic Incident Detection, Support Vector Machines,

K-Nearest Neighbour, Autoregressive Integrated Moving Average.

Abstract: One of the most challenging goals of the modern Intelligent Transportation Systems comprises the accurate

and real-time short-term traffic prediction. The achievement of this goal becomes even more critical when the

presence of atypical traffic conditions is concerned. In this paper, we propose a novel hybrid technique for

short-term traffic prediction under both typical and atypical conditions. An Automatic Incident Detection

(AID) algorithm, based on Support Vector Machines (SVM), is utilized to check for the presence of an

atypical event (e.g. traffic accident). If such an event occurs, the k-Nearest Neighbors (k-NN) non-parametric

regression model is used for traffic prediction. Otherwise, the Autoregressive Integrated Moving Average

(ARIMA) parametric model is activated for the same purpose. In order to evaluate the performance of the

proposed model, we use open real world traffic data from Caltrans Performance Measurement System

(PeMS). We compare the proposed model with the unitary k-NN and ARIMA models, which represent the

most commonly used non-parametric and parametric traffic prediction models. Preliminary results show that

the proposed model achieves larger accuracy under both typical and atypical traffic conditions.

1 INTRODUCTION

Nowadays, the interest in developing Intelligent

Transportation Systems has grown significantly with

respect to the need for providing qualitative

transportation services, either for individuals or fleets

of vehicles. In this context, the ability to accurately

predict traffic in various steps ahead in time is of

paramount importance.

The main reason, for which the traditional traffic

prediction models fail to accurately predict traffic in

real conditions is the presence of atypical conditions.

These may include severe weather conditions, car

accidents, road maintenance works and traffic

congestion, due to special cultural events (e.g.

concerts or sport games). These atypical conditions

often result in steep spikes in the traffic time series

that the standard traffic prediction models fail to

accurately represent, as these models are based on big

traffic data with insignificant abnormalities.

Also, atypical events are difficult to classify

because they vary in type, duration, severity, effect on

the state of the traffic network, etc. On the other hand,

incidents may occur that do not cause observable

effects on traffic. Similarly, the occurrence of spikes

in a traffic time series does not necessarily correspond

to atypical conditions. These cases render the

problem of traffic prediction in atypical conditions as

a non-trivial one.

In this paper, we present a novel pattern transition

model for short-term traffic prediction for typical, as

well as (and more importantly) atypical conditions.

We use an SVM-based automatic incident detection

model to automatically detect the presence of an

atypical situation. When this case occurs, the non-

parametric k-NN regression model is fetched to

calculate the predicted traffic value. Otherwise, the

ARIMA parametric model is activated.

In summary, our main contributions can be outlined

as follows:

1. We propose a novel pattern transition model

for short-term traffic prediction under both

typical and atypical conditions. Our model

automatically recognizes the presence of an

atypical situation and activates the most

Theodorou, T-I., Salamanis, A., Kehagias, D., Tzovaras, D. and Tjortjis, C.

Short-Term Trafﬁc Prediction under Both Typical and Atypical Trafﬁc Conditions using a Pattern Transition Model.

DOI: 10.5220/0006293400790089

In Proceedings of the 3rd International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2017), pages 79-89

ISBN: 978-989-758-242-4

Copyright © 2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

79

appropriate prediction model, based on the

outcome of an incident detection algorithm.

2. The proposed model incorporates the

incident information for more accurate

prediction.

3. We evaluate the functionality and

performance of our model against real data

that includes both traffic and incident

information.

The rest of the paper is organized as follows.

Section 2 summarizes related work. Section 3

describes the data used for training our prediction

model and for its evaluation, whereas Section 4

provides a detailed description of the implemented

model. Section 5 presents the evaluation framework,

including the process of setting up the various

experiments, the selection of the various datasets, the

metrics used for the evaluation of both the incident

detection and traffic prediction models and also the

experimental results. Finally, Section 6 concludes the

paper, reviewing the main contributions and

suggesting future directions.

2 RELATED WORK

The research problem of short-term traffic prediction

has been extensively studied in the last ten years. The

various relevant techniques can be roughly classified

into the following four major categories: naïve,

parametric, non-parametric and hybrid.

The naïve methods are the most cost-effective

prediction models and are mainly used as benchmark

against more sophisticated methods. They are

characterized by the absence of any advanced

mathematical model. Some of the most common

naïve methods for traffic prediction include the use of

the last observed value, the simple moving average

with a predefined time window T and the cumulative

moving average of all past traffic values.

Parametric models are the ones, which involve the

estimation of predefined parameters using historical

traffic data. These methods mainly originate from

time series analysis. Most of the works in this class

are based on the classic Box & Jenkins

Autoregressive Integrated Moving Average model

(Box and Jenkins, 1971). In their work, Stathopoulos

and Karlaftis presented a multivariate state-space

ARIMA approach for modelling and predicting

traffic flow, showing that different model

specifications are more appropriate for different

periods of the day (Stathopoulos and Karlaftis, 2003).

Moreover, Kamarianakis and Prastacos developed a

Space-Time ARIMA model with robust behavior

(Kamarianakis and Prastacos, 2005) which was

extended by Min and Wynter in an effort to deal with

the supposed stationarity of the process and the

constant relationship between the neighbor road

segments in a traffic network (Min and Wynter,

2011). More recently, an Auto-Regressive Moving

Average with an eXogenous input (ARMAX) model

with an optimal multiple-step-ahead predictor of

traffic demand was proposed by Wu et al. (Wu at al,

2014). In the same class, Mu et al. proposed a method

that utilizes heterogeneous delay embedding (HDE)

to extract an informative feature space for regression

analysis of traffic data (Mu et al., 2012). Additional

similar approaches include the works of (Guo and

Williams, 2010, Kamarianakis et al., 2012, Ghosh et

al., 2009).

The non-parametric models are mainly originated

from the machine learning field and are based on k-

NN regression, Artificial Neural Networks (ANN)

and Support Vector Regression (SVR) techniques.

The k-NN in short-term traffic prediction was

introduced by Smith and Demetsky who claimed that

it performs better than both the historical average and

parametric ARIMA model in terms of robustness

against variable data sets (Smith and Demetsky,

1996). The k-NN non-parametric regression

algorithm was utilized by several other researchers

for building accurate traffic prediction models (De

Fabritiis et al., 2008, Kindzerske and Ni, 2007,

Myung et al., 2012, Zheng and Su, 2014). Regarding

the use of ANNs, Vlahogianni et al. introduced the

auto- and cross-correlated effect of the traffic flow

time series in a neural network model in the form of

external information (Vlahogianni et al., 2003).

Finally, Wu et al. (Wu et al., 2003) and Hu et al. (Hu

et al., 2015) used the SVR algorithm for increasing

the accuracy of prediction.

Noticeable research effort has been given on the

development of hybrid traffic prediction techniques

that try to exploit the strong characteristics of both

parametric and non-parametric approaches. These

include e.g. a model that combines ARIMA and ANN

processes (Zhang, 2003), but also the combination of

Non-linear Autoregressive Moving Average with

exogenous inputs (NARMAX) that involves fuzzy

systems with ANN (Gao and Er, 2005). Similarly,

Quek et al. presented a special case of a fuzzy neural

network for short-term traffic prediction that shows

high adaptation to the input and high prediction

capacity (Quek et al., 2006).

Despite the multitude of proposed models for

short-term traffic prediction, very few of them deal

with the problem of traffic prediction under atypical

traffic conditions, such as rapid weather changes,

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

80

traffic incidents, road maintenance works, and special

events (e.g. concerts or sport events) etc. These

abnormalities lead to traffic conditions that the

traditional traffic predictions models are difficult to

capture. To this end, relevant research efforts are

quite limited. Amongst those, Castro-Neto et al.

proposed the Online Support Vector Regression (OL-

SVR) model for short-term traffic prediction under

both typical and atypical conditions (Castro-Neto et

al., 2009). They compared their model with well-

known models including Gaussian Maximum

Likelihood (GML), Holt exponential smoothing and

ANN and have proved that even if the GML model

shows the best performance in terms of prediction

accuracy under typical traffic conditions, the OL-

SVR model performs even better under non-recurring

atypical traffic conditions. Another example is the use

of three different prediction models, each with a

different configuration of the explanatory traffic

variable (Guo et al., 2010).

In this approach, it is shown empirically that k-

NN in conjunction with the third configuration of the

explanatory variable outperforms the ANN under all

conditions. Also in an extension of the previous work,

it is proven that the k-NN and SVR non-parametric

regression models have similar prediction accuracy

under typical traffic conditions but k-NN outperforms

SVR during atypical ones (Guo et al., 2012). By

enhancing the previous k-NN model with data

smoothing and de-noising components an even better

accuracy can be achieved (Guo et al., 2014). Hybrid

approaches have been also developed, such as the

Online Boosting Non-Parametric Regression

(OBNR), consisting of two parts: (a) a typical non-

parametric regression model for typical conditions,

and (b) a boosting part activated when atypical traffic

conditions occur and deactivated when the traffic

state turns back to normal. Real data experiments

prove that the OBNR model performs better than the

classic non-parametric regression and SVR models

during atypical traffic conditions (Wu et al., 2012).

Finally, an alternative approach was proposed by

Ni et al. which, in addition to traffic, it also uses data

from social networks (Twitter) in order to predict

traffic, prior to major sport game events. By fusing

both tweet rate and semantic features into the typical

prediction model, improved prediction accuracy can

be achieved (Ni et al., 2014).

A closer look on the current literature, does reveal

that in none of the aforementioned models traffic data

with atypical incidents is used for training. On the

contrary, training is based solely on data from typical

1

Available at: http://pems.dot.ca.gov/

conditions, whereas data from both typical and

atypical conditions is used for testing. Hence, the key

characteristic that distinguishes our work from the

current literature, is that in our model we incorporate

traffic data with atypical condition into the training

process of our proposed model. This is expected to

produce more accurate traffic prediction models.

3 DESCRIPTION OF DATA

The Caltrans Performance Measurement System

(PeMS

1

), was used for building and evaluating our

model. PeMS is an Archived Data User Service that

collects over ten years of data for historical analysis.

The traffic data is coming from over 39,000 Vehicle

Detection Stations (VDS) scattered on the freeway

system of all major metropolitan areas of the State of

California, USA. They include flow, occupancy and

speed values, as well as meta-information about the

VDS, e.g. the identification numbers of the district

and the freeway, in which the VDS is located, the

coordinates of the VDS, etc. Traffic data is sampled

every 30 seconds and aggregated into 5-minute and

1-hour time intervals. The user can select to acquire

the data either in raw or aggregated format.

Figure 1: PeMS Caltrans map.

PeMS also provides incident data collected by the

California Highway Patrol (CHP). This dataset

contains information about the incidents occurred on

the Caltrans network, such as location of the incident

(latitude, longitude), timestamp, type (e.g. car

accident, road maintenance works etc.), duration (in

minutes), etc. The incidents are reported by network

users to CHP, which maintains logs. The map of the

overall area that provides traffic and incident data in

PeMS is shown in Figure 1.

For the purpose of our research we have used a

small part of the above dataset for training and

evaluating our model. In particular, our dataset

Short-Term Trafﬁc Prediction under Both Typical and Atypical Trafﬁc Conditions using a Pattern Transition Model

81

includes speed probes recorded in the areas of San

Jose, Oakland, California (district 4 in Figure 1) and

covers a total time period of 123 days, from May 1 to

August 31, 2015. We acquired data in their

aggregated form in 5-minute intervals. Also, the

dataset contains only incident data in the same area

and time period.

4 PATTERN TRANSITION

ALGORITHMS

In this section, we present the pattern transition model

we have developed for traffic prediction under both

typical and atypical conditions. We use a SVM-based

AID model to detect the occurrence of atypical

conditions. On detection of an atypical situation by

the AID, the k-NN non-parametric regression model

is activated. Otherwise, the ARIMA parametric

model is used. The flow chart of the Figure 2 shows

the whole process.

Figure 2:Flow Chart of the proposed method.

In the following subsections, we present all

modules that comprise the proposed model.

4.1 Automatic Incident Detection

In order to create our AID model, we used a

supervised machine learning algorithm. Specifically,

2

https://en.wikipedia.org/wiki/Support_vector_machine

we chose the Support Vector Machines algorithm,

which is fairly robust to irrelevant features (Gakis et

al. 2014). The basic idea of SVM is to generate a

hyperplane that divides the data set into classes. Our

problem is a binary classification one, thus we have

two classes, which represent the presence or absence

of an incident at a specific time interval and road of

the traffic network.

In the linear SVM, we are given a training data set

with n points of the form (x

1

, y

1

), …, (x

n

, y

n

) where y

i

indicates the class and takes either 1 or -1 as a value,

and x

i

is a p-dimensional real vector, called feature

vector. In our case the number of features is five,

hence x

i

is a 5-dimensional vector. The objective is to

find the maximum-margin hyperplane that divides the

group of points x

i

for which y

i

= 1, from the group of

such points that y

i

= -1, so that the distance between

the hyperplane and the nearest point x

i

from either

group is maximized. Any hyperplane can be written

as the set of all vectors x that satisfy:

0wx b⋅− =

(1)

where w is the normal vector to the hyperplane and

b/|w| a parameter that defines the offset of the

hyperplane from the origin, along the normal vector

w as shown in Figure 2.

Figure 3: Maximum-margin hyperplane and margins on

linear SVM kernel

2

.

In the feature extraction process, we tested both

speed and occupancy values in order to select the best

features. Initially we used only speed values to create

the features, and then we added features derived from

occupancy values. When the occupancy values

were

included to the feature extraction process, the

accuracy of the AID model was reduced and as a

consequence the traffic prediction accuracy. Finally,

different types of data (e.g. weather data) could be

used in the feature extraction process, but this remains

to be examined as future work.

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

82

For this reason, we used only speed in order to

detect the incidents occurred in a highway. Therefore,

we extracted two features based on the speed of the

road of interest and its adjacent roads. In addition to

the current time interval, the speed values of previous

intervals are also taken into account.

The first extracted feature F

1

is taken as the

difference between the speed of the road of interest

and the average speed of its adjacent roads, in the

direction that the vehicles travel, at current time. This

value was normalized by the speed of the road of

interest at the same time.

,(),

1

1

,

1

k

roi t ar j t

j

roi t

SS

k

F

S

=

−

=

(2)

In (2), S represents speed, whereas index ar refers to

the adjacent road, roi refers to the road of interest, k

is the total number of adjacent roads and t is the

current interval.

We also extracted the following three features

(based on F

1

) for three time intervals prior to the

current one:

,1 (),1

1

2

,1

1

k

roi t ar j t

j

roi t

SS

k

F

S

−−

=

−

−

=

(3)

,2 (),2

1

3

,2

1

k

roi t ar j t

j

roi t

SS

k

F

S

−−

=

−

−

=

(4)

,3 (),3

1

4

,3

1

k

roi t ar j t

j

roi t

SS

k

F

S

−−

=

−

−

=

(5)

The selection of the optimal number of previous

intervals, was made after experimentation with

various numbers.

The selection of the above features is based on the

observation that when an incident occurs on a road,

the average speed of this road and its neighbouring

ones, in the same direction, decreases. However,

taking into account only the values of these features,

results in a biased model, prone to error, as it becomes

capable of detecting low speeds, and especially much

lower than the speed of the adjusted roads. Therefore,

the selection of one more feature was necessary. To

this end, we used as an extra feature the average

absolute deviation of the real speed of the road of

interest at current time with respect to its average

value of all previous intervals up to the current one

(including this).

,

0

5

,

0

|()|

,

1

()

1

p

roi t j roi

j

p

roi t k

k

roi

SmS

F

p

S

mS

p

−

=

−

=

−

=

+

=

+

(6)

where S is the speed of the road of interest, p is the

number of past intervals, over which we calculate the

average value m(S). As a fifth feature, we also tested

the squared deviation from the mean of the speed.

This resulted in reduced classifier’s accuracy. These

are the five feature that comprise the vector space

model for each road of interest. Based on the feature

vectors produced in this way, a different SVM-based

AID model is built for each road of interest.

Finally, we experimented with various values for

the C parameter of the SVM algorithm, using one-out

cross validation, in order to estimate those that fit

better to our case. Using a grid search on C = 2

-5

, 2

-3

,

…, 2

15

with step 2, we concluded that the most

appropriate value is C = 1.1.

4.2 Traffic Prediction

For the task of traffic prediction, we used two models:

(a) the ARIMA parametric model and (b) the k-NN

model, in order to predict traffic under typical and

atypical conditions, respectively. Based on the

relevant literature regarding traffic prediction under

atypical conditions (Section 2), the time series models

fail to capture the abnormalities on the values of the

examined traffic variable, that are generated during a

traffic incident. On the other hand, the non-parametric

models and specifically the non-parametric

regression (e.g. k-NN regression) can follow these

abnormalities especially when these models have

been fitted using data from similar past abnormal

conditions.

4.2.1 Autoregressive Integrated Moving

Average

The Auto Regressive Integrated Moving Average

(ARIMA) family of models is the most widely

deployed approach for vehicular traffic prediction

and for time series prediction in general. ARIMA is a

generalisation of the Auto-Regressive Moving

Average (ARMA) model, which is applied strictly to

stationary time series.

An ARIMA (p, d, q) process is expressed as:

Short-Term Trafﬁc Prediction under Both Typical and Atypical Trafﬁc Conditions using a Pattern Transition Model

83

()

11

11 1

pq

d

ii

itit

ii

LLX L

ϕ

θε

==

−⋅−⋅=+⋅

(7)

where p is the order of the autoregressive model, d is

the degree of differencing and q is the order of the

moving average model. In our case, we used an

ARIMA (3, 1, 0) model with three previous terms and

1

st

degree of differencing for reaching stationarity.

The resulted model is show in equation:

'' ' '

,1,12,23,3roi t roi t roi t roi t

SS S S

ϕϕ ϕ

−− −

=⋅ +⋅ +⋅

(8)

where

'

,,,1roit roit roit

SSS

−

=−

(9)

is the differenced S

’

process, which is wide-sense

stationary. According to Pfeifer and Deutsch, the best

estimate of parameters φ are the maximum likelihood

estimates (Pfeifer and Deutsch, 1980). As without a

priori knowledge of their initial values, these

estimates cannot be exactly computed, a close

approximation via ordinary least squares (OLS) is

used. In particular, for every training sample an

equation of the form of (9), is constructed where φ are

the unknown parameters. This forms a linear

overdetermined system of equations of the form:

yX

β

=⋅

(10)

The system given by the aforementioned equation can

be re-written by the use of normal equations, as:

()

^

TT

X

XXy

β

⋅= ⋅

(11)

Using the OLS method we take the following

solution.

()

^

1

TT

X

XXy

β

−

=⋅⋅⋅

(12)

When the model is built (the φ parameters have been

estimated) we use the following equation for

calculating the predicted value:

''''

12132th t t t

SSSS

ϕϕ ϕ

+−−

=⋅+⋅ +⋅

(13)

where, h is the prediction horizon.

4.2.2 k-Nearest Neighbors

For the prediction of the speed values under atypical

conditions we have chosen the k-NN regression

model which appears to be a suitable algorithm for

atypical traffic prediction, using an atypical historical

dataset. k-NN is a non-parametric algorithm that

stores all available cases and predicts the numerical

target based on a similarity measure and an averaging

scheme. The k-NN algorithm has been used in

statistical estimation and pattern recognition tasks,

already since the beginning of 1970’s as a non-

parametric technique.

k-NN prediction is based on the current state

vector (at current time interval t), of the form:

,,,1,2,

,, ,,

roi t roi t roi t roi t roi t p

ySSS S

−− −

=

(14)

where S is the traffic variable (in our case speed) and

p the number of past intervals. As shown, the current

state vector of a road of interest depends on the values

of speed at the current and previous p time intervals.

In order to make prediction, the k-NN algorithm

creates vectors of the form (14), y

1,t

, y

2,t

, … , y

N,t

for N

other roads of the network. When a prediction for the

road of interest for h intervals ahead in time is

requested, the algorithm compares y

roi,t

, with y

1,t

, y

2,t

,

… , y

N,t

using a distance metric (usually Euclidean

distance) and keeps the k vectors with the shortest

distances.

Then, it calculates the value S

roi,t+h

using an averaging

scheme on the estimated k neighbors, which in the

simplest form is given by (15).

,

1

,

k

it h

i

roi t h

S

S

k

+

=

+

=

(15)

In our implementation, we used the inverse

distance weighted average as the averaging scheme,

as shown in (16).

,

1

,

k

iith

i

roi t h

wS

S

k

+

=

+

=

,

(16)

where:

()

2

,

,,

0

11 1

i

p

iroii

roi t j i t j

j

w

dd

SS

−−

=

== =

−

(17)

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

84

We chose the optimal value for k via one-out cross

validation in our data set, based on the prediction

accuracy results. By this process, we concluded that

the most optimal value of k in our case was 6.

5 EVALUATION

In this section, we present the set-up of the evaluation

framework, including the construction of the traffic

time series and their enrichment with incident

information, the choice of a specific part of the

Caltrans road network as case study and the

separation of the training and test data. Finally, the

preliminary evaluation results are presented.

5.1 Constructing Traffic Time Series

with Incident Information

In order to build and evaluate our model the first step

was to pre-process the initial data (both traffic and

incidents) in order to create traffic time series that will

include incident information. For this reason, we

discretized time into 5-minute intervals and we

aggregated the speed values that belong to each

interval. We used this formulation in order to both fit

our model and to make predictions for a number of

steps ahead in time. In the case of short-term traffic

prediction, the predictions are made for up to 1 hour

ahead in time, i.e. 12 5-minutes intervals.

In the examined area, there are 350 VDS in total,

from which 112 were not taken into account because

they provided only zero values. The traffic data from

the remaining 238 VDS were matched to road

segments of the Caltrans network (based on their

coordinates). This process resulted into 55 road

segments having traffic data. As we described above,

the features of our AID model take into account not

only the speed of the road of interest, but also the

speed of its adjacent roads. For this reason, we kept

only the road segments for which, their spatial

neighbors traffic data exist.

Concerning incident data, there were 4,193

incidents in the area and time period concerned.

These incidents where matched to the aforementioned

55 road segments, for which traffic data is available.

For each day of the total examined period and each of

these 55 road segments, a speed time series was

constructed from speed values occurred in the

specific 5-minute interval of this day and road

segment. In this way 6,765 (55 road segments times

123 days of traffic data for each road segment) speed

were constructed. These time series, in addition to

traffic information, include typical and atypical

intervals, indicated by 0 and 1, respectively. The

value 1 indicates presence of an incident in the

specific time interval and 0 its absence. For instance,

for the road segment with identification number 76 on

May 21, 2015 on time interval 01:10-01:15 the

corresponding value of the speed time series is

‘66.55;0’. This means that the speed was 66.55 mph

and no incident situation was present.

5.2 Case Study: A Part of US101

Highway

One of the main difficulties when trying to predict

traffic under atypical conditions, is that the effect of

abnormalities on traffic time series is not easily

observable and interpretable. For instance, there may

be an incident with specific characteristics (type,

duration, severity, etc.) that caused a steep fall on the

traffic time series of a road network, and another

incident with exactly the same characteristics that

happened on the same road at a different time of the

day and had no effect on the traffic time series. On the

other hand, there may be observable discrepancies

from the typical pattern of the traffic time series that

do not necessarily correspond to the presence of an

incident. These situations may confuse the AID

model. In order to overcome these difficulties, we had

to choose road segments with observable effects on

their traffic time series due to occurring incidents.

5.3 Training Versus Testing Data

The traffic time series of the aforementioned road

segment consist the main data set. From this, the one

that corresponds to 25 August, 2015 was selected as

a test time series, which has both typical and atypical

intervals. This time series was selected because it has

spikes that corresponds to the occurrence of incidents.

The preceding 116 time series formed the training

data set. From this, we created three separated data

sets in order to fit our model and the benchmarking

methods in different traffic conditions. The first data

set includes only the ones without atypical intervals

(incident-free), whereas the second data set includes

those with both typical and atypical (incident).

Finally, the third one contains all time series (total).

We trained the AID and the k-NN models using

the total training data set, whereas for the ARIMA

model the incident-free data set was used. Hence, we

incorporate incident information to the fitting process

of our model, as opposed to the current related work.

Short-Term Trafﬁc Prediction under Both Typical and Atypical Trafﬁc Conditions using a Pattern Transition Model

85

5.4 Benchmarks and Accuracy Metrics

For the evaluation of the AID model we calculated a

number of metrics using one-out cross validation in

the total training data set. The first metric that we

calculated was the accuracy:

TP TN

Accuracy

TP FP TN FN

+

=

+++

(18)

where TP is the true positive, TN the true negative,

FP the false positive and FN the false negative

predicted classes. However, accuracy is not really a

reliable metric for the real performance of a classifier

when the number of samples in different classes vary

greatly (unbalanced target) because it will yield

misleading results. In our case, from the total number

of 288 intervals in the traffic time series, only in 30

or less intervals an incident was occurred. For this

reason, in order to evaluate our model accurately, we

calculated two additional metrics. The first one is

sensitivity, a measurement of the proportion of

positives that are correctly identified, whose formula

is shown below:

TP

Sensitivity

TP FN

=

+

(19)

The second additional metric is specificity, which

measures the proportion of negatives that are

correctly identified. Its formula is given by the

following equation:

TN

Specificity

TN FP

=

+

(20)

Using the sensitivity and specificity we created the

Receiver Operating Characteristic (ROC) curve,

which illustrates the performance of our classifier.

For benchmarking we used the unitary ARIMA

and k-NN models. These models were initially fitted

using only the incident-free training data set, as

happens in most of the works on traffic prediction

under atypical conditions in current literature, and

then using different combinations of all three datasets

(incident-free, incident and total). We assessed the

resulted accuracies by the means of two metrics: (a)

the Root Mean Square Error (RMSE) and (b) the

Symmetric Mean Absolute Percentage Error

(SMAPE).

RMSE is given by the following formula:

2

1

()

n

tt

t

PA

RMSE

n

=

−

=

,

(21)

where n is the number of predictions, A

t

the actual

values and P

t

the predicted values.

SMAPE gives a percentage error that has both a

lower and an upper bound of 0% and 100%,

respectively. This makes its values more easily

interpretable. The formula of SMAPE is the

following:

1

||

100%

||||

n

tt

t

tt

PA

SMAPE

nAP

=

−

=⋅

+

,

(22)

where n is the number of predictions, A

t

the actual

values and P

t

the predicted values.

5.5 Experimental results

The evaluation results of the AID model are shown in

Table 1. Additionally, the ROC curve of the classifier

is shown in Figure 1.

Table 1: The evaluation results of the AID model.

AID evaluation metrics

Accurac

y

0.8986

Sensitivit

y

0.6364

Specificit

y

0.9091

Figure 4: ROC curve of the proposed AID schema.

We can see that although the proposed model is

quite above the line of no-discrimination (the

diagonal line), it is also quite far from the upper left

corner of the ROC space (best possible classification

prediction). This mainly happens due to the

imbalance of the records of the classification classes

(incident, non-incident) in the examined data set. In

any case, the curve shows that there is enough room

for improvement for the proposed AID model.

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

86

The results of the experiments regarding the

prediction accuracy of our model are shown in Figure

5 and Figure 6.

Figure 5: Prediction accuracy results in RMSE.

Figure 6: Prediction accuracy results in SMAPE.

As shown in the aforementioned figures, in total,

the proposed model outperforms its competitors. In

particular, our model presents almost similar

prediction accuracy with the ARIMA model under

typical conditions, but it exhibits the best

performance under atypical conditions.

As already mentioned, the benchmarking models

were initially trained by incident-free data.

Subsequently, we conducted a series of experiments,

in which the two unitary benchmarking models were

trained using different combinations of the incident-

free, incident and total data sets. In this way, we

incorporated the incident information not only in the

data fitting process of the proposed model, but also in

the fitting process of its competitors. As shown in

Figure 7 and Figure 88, again the proposed model

presents superior accuracy for all intervals.

Figure 7: Prediction accuracy results in RMSE, for different

benchmarking combinations and for all intervals.

Figure 8: Prediction accuracy results in SMAPE, for

different benchmarking combinations and for all intervals.

Figure 9: Real and predicted speed time series.

In Figure 9 is shown both the actual and the predicted

time series of speed. It is obvious that the proposed

model fits the actual values of speed.

In order to evaluate the statistical significance of

the improvement that our model introduces we run a

t-test. To this end, we examine the null hypothesis

23

33

43

53

63

73

0 50 100 150 200 250

Speed (mph)

5-minute inteval

real predicted

87

that the proposed model has equal accuracy with the

ARIMA model. Since there is no indication that the

predicted values have normal distributions, we used

the Wilcoxon signed-rank test. The test showed that

at significance level of 0.05 the null hypothesis could

be rejected for all the aforementioned benchmarking

cases. Therefore, we can claim that the proposed

model presents statistically significantly better

accuracy from the ARIMA model in all cases.

6 CONCLUSIONS

In this paper we introduced a novel hybrid method for

short-term traffic prediction under both typical and

atypical traffic conditions. We introduced a SVM-

based AID model that identifies the presence of

atypical conditions. We use the ARIMA parametric

model or the k-NN non-parametric regression model

if the AID identifies typical or atypical conditions,

respectively. We evaluated our model using real open

data from the Caltrans PeMS and showed that it

outperforms the benchmarking models in terms of

prediction accuracy under both typical and atypical

conditions.

The proposed model can be implemented using

either speed or flow data. In this work, we selected

speed data because speed is a traffic variable that

provides clearly interpretable results regarding the

traffic state of a network and also it can be easily

converted to travel time, which is a useful metric for

many ITS applications like vehicle routing.

Future work involves experimenting with

additional feature extraction techniques for

improving the accuracy of the proposed AID model.

Furthermore, more extensive comparison of the

proposed model against additional prediction models

using larger data sets is essential for further

investigating the conditions under which the

proposed model provides the best performance.

ACKNOWLEDGMENT

This work has been partially supported by the

European Commission through the project

RESOLUTE (ID: 653460), funded by Horizon 2020.

The opinions expressed in this paper are those of the

authors and do not necessarily reflect the views of the

European Commission.

REFERENCES

Abdulhai, B., Porwal H., Recker, W., 1997. Short term

freeway traffic flow prediction using genetically

optimized time delay based neural net-works. Proc.,

78

th

Transportation Research Board Annual Meeting,

Washington D. C., USA.

Box, G. E. P. and Jenkins, G. M., 1971. Time series analysis

forecasting and control. Operational Research

Quarterly, 22(2), June, pp.199-201.

Castro-Neto, M., Jeong, Y. S., Jeong, M. K., Han, L. D.,

2009. Online-SVR for short-term traffic flow prediction

under typical and atypical traffic conditions. Expert

Systems with Applications, 36(3), pp.6164-6173.

Clark, S., 2003. Traffic Prediction Using Multivariate

Nonparametric Regression. Journal of Transportation

Engineering, 129(2), pp.161-168.

De Fabritiis, C., Ragona, R., Valenti, G., 2008. Traffic

estimation and prediction based on real time floating car

data. Proc., IEEE 11

th

International Conference on

Intelligent Transportation Systems, pp.197-203.

Dougherty M. S. and Cobbet M. R., 1997. Short term inter-

urban traffic fore-casts using neural networks.

International Journal of Forecasting, 13(1), March,

pp.21-31.

Gakis E., Kehagias D., Tzovaras D., 2014. Mining Traffic

for Road Incidents Detection. IEEE 17

th

International

Conference on Intelligent Transportation Systems,

Qingdao, China, pp. 930-935.

Gao, Y. and Er, M., J., 2005. Narmax time series model

prediction: feed-forward and recurrent fuzzy neural

network approaches. Fuzzy Sets and Systems, 150(2),

March, pp.331-350.

Ghosh, B., Basu, B., O’Mahony, M., 2009. Multivariate

short-term traffic flow forecasting using time-series

analysis. IEEE Transactions on Intelligent

Transportation Systems, 10(2), pp.246-254.

Guo, F., Krishnan, R., Polak, J. W., 2012. Short-Term

Traffic Prediction Under Normal and Abnormal Traffic

Conditions on Urban Roads. 91

st

Transportation

Research Board Annual Meeting, Washington D. C.,

USA.

Guo, F., Krishnan, R., Polak, J. W., 2014. A novel three-

stage framework for short-term travel time prediction

under normal and abnormal traffic conditions. 93

rd

Transportation Research Board Annual Meeting,

Washington D. C., USA.

Guo, F., Polak, J. W., Krishnan, R., 2010. Comparison of

Modelling Approaches for Short-Term Traffic

Prediction under Normal and Abnormal Conditions.

IEEE 13

th

International Conference on Intelligent

Transportation Systems, Madeira Island, Portugal.

Guo, J. and Williams, B. M., 2010. Real-time short-term

traffic speed level forecasting and uncertainty

quantification using layered Kalman filters.

Transportation Research Record: Journal of

Transportation Research Board, 2175, pp.28-37.

Hu, W., Yan, L., Liu, K., Wang, H., 2015. A Short-term

Traffic Flow Forecasting Method Based on the Hybrid

PSO-SVR. Neural Processing Letters, pp.1-18.

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

88

Innamaa, S., 2000. Short term prediction of traffic situation

using MLP-neural networks. Proc., 7

th

World Congress

on Intelligent Systems, Turin, Italy, pp.1-8.

Kamarianakis, Y. and Prastacos, P., 2005. Space-time

modelling of traffic flow. Computers & Geosciences,

31(2), pp.119-133.

Kamarianakis, Y., Shen, W., Wynter, L., 2012. Real-time

road traffic forecasting using regime-switching space-

time models and adaptive LASSO. Applied Stochastic

Models in Business Industry, 28(4), pp.297-315.

Kindzerske, M. D. and Ni, D., 2007. Composite nearest

neighbour nonparametric regression to improve traffic

prediction. Transportation Research Record: Journal

of Transportation Research Board. 1993(1), pp.30-35.

Min, W. and Wynter, L., 2011. Real-time traffic prediction

with spatiotemporal correlations. Transportation

Research Part C: Emerging Technologies, 19(4),

August, pp.606-616.

Mu, T., Jiang, J., Wang, Y., 2012. Heterogeneous delay

embedding for travel time and energy cost prediction

via regression analysis. IEEE Transactions on

Intelligent Transportation Systems, 14(1), pp. 214-224.

Myung, J., Kim, D. K., Kho, S. Y., Park, C. H., 2012. Travel

Time Prediction Using k-Nearest Neighbour Method

with Combined Data from Vehicle Detector System and

Automatic Toll Collection System. Transportation

Research Record: Journal of Transportation Research

Board, 2256, pp.51-59.

Ni, M., He, Q., Gao, J., 2014. Using Social Media to Predict

Traffic Flow under Special Event Conditions. 93

rd

Transportation Research Board Annual Meeting,

Washington D. C., USA.

Pfeifer, P. E. and Deutsch, S. J., 1980. A three-stage

iterative procedure for space-time modelling.

Technometrics, 22(1), February, pp.35-47.

Quek, C., Pasqueir, M. and Lim, B. B. S., 2006. Pop-traffic:

a novel fuzzy neural approach to road traffic analysis

and prediction. IEEE Transactions on Intelligent

Transportation Systems, 7(2), June, pp.133-146.

Smith, B. L. and Demetsky, M. J., 1996. Multiple interval

freeway traffic flow prediction. Transportation

Research Record: Journal of Transportation Research

Board, 155(4), pp.136-141.

Stathopoulos, A. and Karlaftis, M. G., 2003. A multivariate

state-space approach for urban traffic flow modelling

and prediction. Transportation Research Part C:

Emerging Technologies, 11(2), April, pp.121-135.

Vlahogianni, E. I., Karlaftis, M. G., Golias, J. C., 2003. A

multivariate neural network predictor for short term

traffic prediction in urban signalized arterial. Proc. 10

th

IFAC Symposium on Control in Transportation

Systems, Tokyo, Japan, August.

Williams, B. M., Dursavula, P. K., Brown, D. E., 1998.

Urban freeway traffic flow prediction – Application of

seasonal autoregressive integrated moving average and

exponential smoothing models. Transportation

Research Record: Journal of Transportation Research

Board, 1644, pp.132-141.

Wu, C. H., Wei, C. C., Su, D. C., Chang, M. H., Ho, J. M.,

2003. Travel Time Prediction with Support Vector

Regression. IEEE 6

th

International Conference on

Intelligent Transportation Systems, Shanghai, China.

Wu, T., Xie, K., Xinpin, D., Song, G., 2012. A online

boosting approach for traffic flow forecasting under

abnormal conditions. Proc., 9

th

International

Conference on Fuzzy Systems and Knowledge

Discovery, Sichuan, China.

Zhang, G. P., 2003. Time series prediction using a hybrid

ARIMA and neural network model. Neurocomputing,

50, January, pp.159-175.

Zheng, Z., Su, D., 2014. Short-term traffic volume

forecasting: A k-nearest neighbour approach enhanced

by constrained linearly sewing principle component

algorithm. Transportation Research Part C: Emerging

Technologies, 43, pp.143-157.

Wu, Cheng-Ju, Schreiter, Thomas, & Horowitz, Roberto

2014. Multiple-clustering ARMAX-based predictor

and its application to freeway traffic flow prediction.

American Control Conference (ACC), 2014 IEEE

89