AN UNSUPERVISED LEARNING BASED APPROACH FOR

UNEXPECTED EVENT DETECTION

Bertrand Luvison, Thierry Chateau

LASMEA, Blaise Pascal University, Aubi`ere, France

Patrick Sayd, Quoc-Cuong Pham

CEA, LIST, Embedded Vision Systems Laboratory, Boˆıte Courrier 94, F-91191 Gif-sur-Yvette, France

Jean-Thierry Laprest´e

LASMEA, Blaise Pascal University, Aubi`ere, France

Keywords:

Motion learning, Kernel machine, Video surveillance.

Abstract:

This paper presents a generic unsupervised learning based solution to unexpected event detection from a static

uncalibrated camera. The system can be represented into a probabilistic framework in which the detection is

achieved by a likelihood based decision. We propose an original method to approximate the likelihood function

using a sparse vector machine based model. This model is then used to detect efﬁciently unexpected events

online. Moreover, features used are based on optical ﬂow orientation within image blocks. The resulting

application is able to learn automatically expected optical ﬂow orientations from training video sequences

and to detect unexpected orientations (corresponding to unexpected event) in a near real-time frame rate.

Experiments show that the algorithm can be used in various applications like crowd or trafﬁc event detection.

1 INTRODUCTION

Surveillance is still essentially performed by hu-

man supervisors for visually inspecting video streams

from CCTV (Closed Circuit Television) networks.

However, the number of camera never stops increas-

ing. Indeed, public areas become more and more

crowded but also monitored. In this context, real-time

assisting monitoring systems which focus the opera-

tor on saliency cameras are more than necessary. This

type of systems have to deal with complex scenes and

a lot of information for extracting scarce subtle events

as in crowd or trafﬁc monitoring.

Crowd analysis is a very complex problem due to

the variety of events to be detected and the difﬁculty

of modelling them. Examples of events to detect in

these crowded scenes may be abnormal trajectories,

person falls, panic movement, ﬁghting or dangerous

trajectories and unadapted speed in the case of trafﬁc

monitoring. Moreover, the solution must be a real-

time system since events have to be detected on the

ﬂy. In order to reduce the scope of possible events,

researchers have generally addressed speciﬁc applica-

tions. For example, some systems focus on detecting

intrusion in a forbidden zone, while others are dedi-

cated to wrong way movement or motionless entities

detections.

In this paper, we propose a generic unsupervised

learning based system for detecting unexpected mo-

tions by vision within a structured area (trafﬁc mon-

itoring for example). The system works on an efﬁ-

cient and compact representation of the model learnt

adapted for real-time execution. Moreover, the learn-

ing method is fully automatic. It does not need neither

any calibration nor any labelled learning database.

The next section brieﬂy sums up existing systems

dealing with crowd monitoring, Section 3 describes

our learning method and its application to the unex-

pected event detection. Finally, performances of the

system will be evaluated in Section 4, both on the

quality of the leanrt model and on the result obtained

in real situation over a long time testing.

506

Luvison B., Chateau T., Sayd P., Pham Q. and Lapresté J. (2009).

AN UNSUPERVISED LEARNING BASED APPROACH FOR UNEXPECTED EVENT DETECTION.

In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, pages 506-513

DOI: 10.5220/0001796705060513

Copyright

c

SciTePress

2 RELATED WORK

Crowd monitoring for unexpected event detection by

vision can be seen as a classiﬁcation problem, where

each image of a video sequence must be classiﬁed

into one of the two classes: expected and unexpected

event. Related works can be divided into two cate-

gories: model based approaches and learning based

approaches.

To analyse crowd, some approaches require pre-

liminary analyses and treatments for extracting more

elaborated features, usually found thanks to model

based approaches. These features would then be anal-

ysed afterward for detecting unexpected events. For

example, the ellaborated features could be the people

of a crowd or their trajectories. In (Zhao and Nevatia,

2004), a geometric model (ellipse or omega shapes)

to detect body and legs or heads and shoulders into

crowd images is proposed. Fourier descriptors were

also used to characterize human shapes (Dong et al.,

2007). The segmented shapes are then tracked and

pedestrian trajectories are estimated in order to anal-

yse people behaviour. Other researchers mainly in-

terested in trajectories of entity in a crowd, postulate

that the different points tracked from a given person

roughly follow the same trajectory at the same speed.

These points are gathered either with a bayesian

framework as in (Brostow and Cipolla, 2006) and

(Li and Ai, 2007), or with the RANSAC algorithm

(Rabaud and Belongie, 2006). Hybrid methods com-

bine both approaches to segment the crowd, in or-

der to make the system more robust as in (Sidla and

Lypetskyy, 2006). In the case of trafﬁc monitoring,

Hu (Hu et al., 2006) (with the additional hypothe-

sis that an entity cannot partially hide another one),

gathers features (spatial ones or trajectories) with K-

means algorithm. Instead of segmenting the crowd

in a set of pedestrians, Some approaches propose a

global analysis of the motion of the crowd. In (Ve-

lastin et al., 2006), a model of expected motion is built

from priors on circulation way for detecting wrong

way moving person or suspect motionless people.

The image motion is then estimated with optical ﬂow

techniques, coupled with an additional foregroundex-

traction step. However, model based approaches are

very speciﬁc to a given environment and a given ap-

plication. Moreover, such approaches need priors that

are not always available.

In learning based approaches, the model of un-

expected events is learnt from a training set. This

training set can be built from expected object trajec-

tories (Singh et al., 2007; Junejo and Foroosh, 2007;

Hu et al., 2006). Resulting applications detect unex-

pected trajectories using a classiﬁcation framework.

Trajectories can also infer interactions between indi-

viduals (people separating, coming closer, ﬁghting ...)

(Blunsden et al., 2007). In (Andrade et al., 2006),

the main components (extracted from a SVD decom-

position) of crowd motion are learnt using a Hidden

Markov Model (HMM). Other learning machines aim

at classifying an entry between two classes which

correspond well to the wanted system. These two

class learning based classiﬁers like Support Vector

Machine (SVM) (Burges, 1998), Relevant Vector Ma-

chine (RVM) (Tipping, 2001) and Boosting (Freund

and Schapire, 1996) need both positive and negative

samples.

In this paper, we focus on the problem of motion

analysis, and especially detecting unexpected events

from a video sequence provided by a static camera.

Moreover, in order to be as generic as possible, we

choose a learning based approach. The output of the

algorithm is a binary decision associated to the can-

didate image (expected or unexpected). Nevertheless,

this problem is more complex than a common clas-

siﬁcation process like SVM or RVM since there is

no clear deﬁnition of what an unexpected event is,

and so no database of negative samples is available.

Although, one class SVM could have been used, the

features learnt do not spread on a high dimensional

space. We propose an unsupervised learning based

solution to build a model of expected events from a

set of videos with an algorithm simpler than one class

SVM.

3 PROPOSED METHOD

3.1 Overview

Giving an ofﬂine video stream from a ﬁxed camera,

our system is able to learn from a cumulated opti-

cal ﬂow orientation computed into regular blocks of

the image, a non-parametric likelihood representa-

tion. Since the complexity of non-parametric models

depends on the size of the training set, we propose

an original approach to build a sparse vector machine

based model by selecting only relevant samples for

approximatingthe non-parametriclikelihood. Finally,

a classiﬁcation rule is deduced from this likelihood

to classify the main orientation on each block during

online analysis from the same ﬁxed camera (see the

outline on ﬁg.1).

This method has the asset to be completely auto-

matic: no camera calibration is needed, no labelling

task has to be done on the learning database. More-

over, our sparse representation is particularly adapted

for real-time classiﬁcation during online analysis.

AN UNSUPERVISED LEARNING BASED APPROACH FOR UNEXPECTED EVENT DETECTION

507

Figure 1: Outline of the system.

3.2 Likelihood based Decision

Let I denotes a training video sequence. Each im-

age is split into a set of N blocks and optical ﬂow

orientations are associated to each block. Z

Z

Z

(n)

.

=

(z

(n)

1

,z

(n)

2

,...,z

(n)

K

)

T

represents a vector composed by

the history of all optical ﬂow orientations.

We deﬁne a Bernoulli random variable ω

(n)

k

with

value in {ω

1

;ω

2

}: ω

(n)

k

= ω

1

if the event associated to

the block n is an expected event and ω

(n)

k

= ω

2

if the

event associated to the block n is an unexpectedevent.

Since nothing is known on the unexpected event like-

lihood function P(z

(n)

k

|ω

(n)

2

), we choose to deﬁne as

unexpected event, observations which cannot be suf-

ﬁciently reliably assigned to the expected likelihood

function P(z

(n)

k

|ω

(n)

1

). In other words, observations

with too low values in the expected likelihood distri-

bution are considered as unexpected.

b

ω

(n)

= ω

(n)

2

if P(z

(n)

k

|ω

(n)

1

) < C (1)

where C is a detection threshold experimentally de-

termined.

The likelihood P(z

(n)

k

|ω

(n)

1

) can be represented ei-

ther by parametric or non-parametric models.

In the ﬁrst case, GMMs (Gaussian Mixtures Mod-

els) are widely used to approximate the likelihood.

For example Stauffer and Grimson in (Stauffer and

Grimson, 2000) used a ﬁxed number of gaussians

GMM for background substraction. EM (Expec-

tation Maximization) algorithms are generally used

for GMM parameters estimation but without restric-

tions on the number of gaussians in recent version

(Figueiredo and Jain, 2002). Recently, Han in (Han

et al., 2008) proposed a mean-shift based approach to

approximate a given distribution with GMMs.

Non parametric models approximate the distri-

bution from samples using kernel based estimation

(Parzen windows model (Duda et al., 2001)):

b

P(z

(n)

k

|ω

(n)

1

) ≈K

−1

K

∑

k

′

=1

φ

(n)

k

′

(z

(n)

k

) (2)

where φ

(n)

k

′

(z

(n)

k

) is a kernel function (not neces-

sarly gaussian). With such a model, no assumption

needs to be done over the number of mode of the dis-

tribution. Nevertheless, one of the drawbacks of this

model is that the estimation of the probability is pro-

portional to the number of samples used. Reduced

models can be designed by randomly selecting a sub-

set of Z

Z

Z

(n)

. We propose another solution wherein a

sparse model is obtained by approximating eq. (2)

by a weighted sum of basis functions. In order to be

as generic as possible, we choose the non paramet-

ric model instead of GMM aiming at being able to

adapt to more atypical distribution. More speciﬁc ker-

nel functions may be more appropriated than classical

gaussians for these distributions.

3.3 A Sparse Vector Machine Model for

Density Estimation

Equation 2 can also be expressed with the compact

form:

b

P(z

k

|ω

1

) ≈w

w

w

T

(φ

φ

φ(z

k

)) (3)

with w

w

w

T

= (1,...1)

T

/K is a vector of size K and

φ

φ

φ(z

k

) is a vector function deﬁned by φ

φ

φ(z

k

) =

(φ

1

(z

k

),φ

2

(z

k

),...,φ

K

(z

k

)). For the sake of clarity,

the bloc index has been left out. This model has the

same form as models used for Support Vector Ma-

chines (Burges, 1998) or Relevant Vector Machines

(Tipping, 2001).

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

508

We propose to obtain a sparse model from equa-

tion (3) by ﬁxing most of the parameters of w

w

w to zero

as it is classically done. Let Φ denote, a matrix of

size K ×K associated to the block n and built such

as the element of the line i and column j is given

by Φ

i, j

= φ

i

(z

j

). Φ is a square and symmetric ma-

trix, from which, an estimator of the likelihood as-

sociated to the sample z

k

of the training set is given

by

b

P(z

k

|ω

1

) ≈ (1

1,K

)Φ(0

1,k−1

,1,0

1,K−k+1

)

T

(the sum

of the elements of the line or the column k of Φ). A

likelihood vector ϕ

ϕ

ϕ related to the training set is built:

ϕ

ϕ

ϕ = Φ(1

K,1

)/K (4)

The sparse model is built in a two step recursive pro-

cess:

1. The sample of the training set which has the maxi-

mum likelihood is selected fromϕ

ϕ

ϕ and considered

as a relevant vector:

v(m) = argmax

i

ϕ

m,i

(5)

where ϕ

m,i

is the i

th

element of ϕ

ϕ

ϕ

m

and ϕ

ϕ

ϕ

m

= ϕ

ϕ

ϕ for

m = 1.

2. The likelihood estimator is updated:

ϕ

ϕ

ϕ

m+1

= ϕ

ϕ

ϕ

m

−

ϕ

m,v(m)

Φ

v(m),v(m)

φ

φ

φ

z

v(m)

(6)

Steps one and two are repeated until m = M such

as ϕ

M,v(M)

/ϕ

1,v(1)

> t

v

. The parameter t

v

represents

the precision of the likelihood approximation. For a

coarse approximation t

v

can be increased. In this case

the number of used vectors decreases. Illustration of

the effect of this parameter is given in section 4.1. Fi-

nally, the estimator is written by:

b

P(z

k

|ω

1

) ≈

˜

w

w

w

T

˜

φ

φ

φ(z

k

) (7)

with

˜

w

w

w

T

=

ϕ

1,v(1)

Φ

v(1),v(1)

,

ϕ

2,v(2)

Φ

v(2),v(2)

,...,

ϕ

M,v(M)

Φ

v(M),v(M)

(8)

and

˜

φ

φ

φ(z

k

) =

φ

v(1)

(z

k

),φ

v(2)

(z

k

),...,φ

v(M)

(z

k

)

T

(9)

This approach enables to give a good approxi-

mation of the likelihood with few vectors. Initially,

the non parametric model set Z

Z

Z contained K ele-

ments whereas the sparse vector machine model

e

Z

Z

Z =

(v(1),v(2),...,v(M)) contains only M elements with

M ≪K. Let us remember that this reduction in size is

compulsory since it makes the real-time classiﬁcation

possible.

3.4 Online Classiﬁcation Step

Online classiﬁcation based on equation (1) which can

be rewritten with our sparse model by :

b

ω = ω

2

if

˜

w

w

w

T

˜

φ

φ

φ(z) < C (10)

For the area n, a set of optical ﬂow orientation is

computed. The most relevant vector z is then esti-

mated using equation (5) by changing Z by the set of

optical ﬂow orientations associated to the area n.

4 EXPERIMENTS

This section presents the experiments done in order

to validate the proposed method. The ﬁrst part shows

the ability of the sparse model to approximate the dis-

tribution of optical ﬂow orientation. The second part

discusses the performances of the algorithm in two

cases of trafﬁc monitoring.

4.1 Non Parametric Representation

We make the common choice to use Gaus-

sian data-centred basis functions, φ

(n)

k

′

(z

(n)

k

) =

1

σ

l

√

2π

exp{−

d

2

θ

(z

(n)

k

,z

(n)

k

′

)

2σ

2

l

}. Moreover, d

θ

(z

(n)

k

,z

(n)

k

′

) is a

distance between two angles deﬁned by :

d

θ

(θ

1

,θ

2

) =

min( (θ

2

−θ

1

),|θ

2

−(θ

1

+ 2π)| )

π

with θ

1

< θ

2

(11)

The sparse non parametric approximation

e

Z

Z

Z

(n)

presented before has been compared to the expanded

one Z

Z

Z

(n)

. To make this comparison, the theorical

probability representing the non parametric estimator

of the likelihood of equation (2) has been represented

with the yellow area limited by the black dashed curve

on ﬁg.2. The sparse probability

e

Z

Z

Z

(n)

has been drawn

for several values of t

v

with the blue, red and green

curves on ﬁg.2.

The sparse kernel based model curves on ﬁg.2

show that our reduced representation

e

Z

Z

Z

(n)

match the

theorical probability. The variance parameter σ

l

should be chosen large enough to have a sufﬁcient de-

crease in the number of selected vectors with a correct

approximation, but also not to big to separate differ-

ent modes. Experimentally a value of σ

l

= 0.1 has

been chosen. Concerning t

v

which determines how

many selected vectors will be kept or in other words

the precision of the likelihood approximation. We can

see on the two mode distribution of ﬁg. 2 that for high

value of t

v

(blue and red curves), the bandwidth of the

AN UNSUPERVISED LEARNING BASED APPROACH FOR UNEXPECTED EVENT DETECTION

509

−3 −2 −1 0 1 2 3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Orientation angle (radian)

Normal event likelihood

−3 −2 −1 0 1 2 3

0

0.05

0.1

0.15

0.2

0.25

Orientation angle (radian)

Normal event likelihood

Figure 2: Appoximation of the kernel based non parametric density estimation with a spare kernel base model on two different

kind of distribution: The yellow area limited by the black dashed curve shows the kernel based non parametric density

estimation. The blue curve (t

v

= 0.7), the red one (t

v

= 0.4) and the green one (t

v

= 0.08) show the sparse kernel based density

estimation.

sparse density approximation mode is truncated com-

pared to the non parametric density. Smaller modes

can even be forgotten (blue curve). The number of se-

lected vectors, that is to say the number of gaussians

needed for the approximation are respectively 1, 4 and

8 for the blue, the red and the green curves. For a

uniform random distribution, a lot of selected vectors

are needed for a correct approximation. Therefore,

the parameter t

v

needs to be small enough to approx-

imate correctly all the orientations possible contrary

to the blue curve which underestimates some speciﬁc

orientations. Here 21, 26 and 39 gaussians have been

needed for the approximation for the blue, the red and

the green curves respectively.

4.2 Classiﬁcation

The performance of the system has been evaluated on

two different scenes: a trafﬁc cross-road with clear

circulation way and a toll barrier. Sequences for both

scenes were captured on real webcams in very various

weathers going from sunny to cloudy through rainy

condition, but also under very different light condi-

tions such as projected shadows, specular lights, re-

ﬂects on the ground, etc (e.g. ﬁg.3). Each video of

the dataset is a quite low resolution, 320x240 size at

about 12 fps with sometimes very poor objective lens.

The videos taken every 20 minutes, last about 30 sec-

onds each. The learning has been done over one hun-

dred videos. The block are 10x10 size, that is to say a

total of 768 blocks for each image.

For the movement evaluation, different optical

ﬂow techniques have been tested, from Lucas &

Kanade and its variants to “Block matching method”

(Barron et al., 1992) through Proesmans method

(Proesmans et al., 1994) and others. From all the

different techniques, the Black & Anandan method

(Black and Anandan, 1996) has been kept for its ro-

bustness and the cleanliness of its result compared to

others methods, but also for its relative fast computa-

tion. This method is based on a piecewise afﬁne mo-

tion assumption which is generally satisﬁed for our

type of scene.

Moreover, the use of this techinque makes the

system work whichever lighting condition since opti-

cal ﬂow techniques are usually invariant to illumation

changes.

Figure 3: Example of weather conditions in the video

database used for learning. Top row images are taken from

the trafﬁc cross-road sequence and the bottom one from the

toll.

The classiﬁcation performance of the system has

been evaluated in real conditions. The analysis has

been done over 162 videos for the trafﬁc cross-road

and 256 videos for the toll. The anomalies researched

were of different types from pedestrians not walking

on zebra crossing to scooters leaving the pavement

through wrong way mouvement or cars pulling out

just before the toll. The classiﬁcation was done on

a P4 3.4GHz. The optical ﬂow is computed in 0.11s

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

510

(a) (b) (c)

(d) (e) (f)

Figure 4: Different unexpected event samples. The red square represents the unexpected block detected according to the most

relevant vector of this block at time t represented with the blue arrow. The green squares point out missed unexpected events.

(a) Unexpected pedestrians crossing out of zebras and false alarm example on the bus at the bottom, (b) Double missing

detection on a car on the right and on a truck on the left that are both suddenly pulling out, (c) Scooter leaving the pavement

following an unsual trajectory, (d) Fireﬁghter truck going wrong way to avoid held up lanes, (e) Car changing direction at the

last minute on the left of the camera, (f) Motorhome going across several lanes to reach heavyweight lane

per frame. The computation time for analysing one

block depends on displacement vector density in the

block. Non computed for motionless blocks, it takes

about 5ms, that is an average of 0.2s per frame. For

the sequence studied so far, these computation per-

formances are close to real-time even with a non op-

timized code. Moreover, the real-time classiﬁcation

lends itself very well to parallelization on graphic

cards or multithread architectures in order to increase

the framerate.

An unexpected event is considered to be detected

if the system has pointed out at least one block on

the guilty object during the unexpected event (e.g.

ﬁg.4(d)). False alarms have been counted on the same

way, when at least one block appears on an area with-

out true anomaly still during the manoeuvre length

(e.g. ﬁg.6). This manual count has been done from

the warnings per image provided by the classiﬁer.

Even though the system gives a frame warning in-

stead of an event warning (no temporal consistency

on events is performed), we choose to present an event

count as it is a measure of the performance of the sys-

tem very close to what it is expected to do. Indeed,

the system should alert a security ofﬁcer when some-

thing goes wrong, no matter where the problem oc-

curs which is usually obvious for a human. The sys-

tem parameters have been chosen in order to have the

maximum of good detections with the minimum false

alarms.

The parameter C is deciding to guarantee a low

rate of false alarm, the higher it is, the smaller a devi-

ation from a learning orientation will be considered as

abnormal. As a consequence, to avoid false alarm due

to noise, orientation compution imprecision, etc, pa-

rameter C should be chosen quite low even if correct

detections are missed.

With the parameter C equals to 0.3, 17 warnings

have been raised over toll sequences total length of

7680 seconds, that is to say an average of one warn

every 451 seconds. On these 17 warnings, 16 were

real unexpected events whereas the last warning was

a false alarm. Trafﬁc cross road sequences is far more

harder situation to monitor since the circulation ways

are more intricate, the vehicle are smaller, etc. How-

ever, the system still warns 7 times over the 5160s of

the total length , that is to say on average of one warn

every 737 seconds. Over these 7 detections only 3

were real unexpected events. More events could have

been detected for both sequences but with a tremen-

dous increase in the false alarms as shown on table

AN UNSUPERVISED LEARNING BASED APPROACH FOR UNEXPECTED EVENT DETECTION

511

Figure 5: Example of false alarm due

to wrong learning orientation. The three

vectors kept in sparse model have been

represented in green. All of them go

downward.

Figure 6: Example of false alarm due to

the aperture problem. The optical ﬂow

vector are normal to the light beam edge

giving completely wrong results.

Figure 7: Example of false alarm due to

projected shadow. The shadow expands

on areas where no movement from down

right corner to up left corner are never

seen.

1. Figures 4(c)-4(f) illustrate the events to detect, the

main direction for each guilty block but also the de-

tection conditions for each videos (especially lighting

condition). Nevertheless, ﬁgures 4(a) and 4(b) show

missing detections surrounded with a green square.

Table 1: Detection results for different C values. CD stands

for Correct Detection and FA stands for False Alarm.

C

Toll sequences Cross-road sequences

CD FA CD FA

0.3 16 1 3 4

0.43 28 10 4 38

0.58 38 113 13 164

Due to a large number of wrong positive de-

tections that punctually appear, blurrings have been

applied to the classiﬁer output. Assuming that an

anomaly remains for several frames and spreads from

one block to an adjacent one, a temporal blurring has

been applied by ﬁnding the best relevant orientation

among the orientations of the last k frames. Experi-

mentally, using k = 2 deletes some detection noises

due to error in the displacement ﬁeld computation.

Furthermore, a ﬁrst order temporal ﬁlter with a spatial

blurring which balances the response of the neighbour

blocks has been applied.

Most of the remaining wrong positive detection

(wrong unexpected event detected) are mainly due to

three reasons: The ﬁrst one is a possible motion that

has not been learnt (this motion has not been seen

enough into the training database). An illustration of

this, is shown on ﬁg.5 where the learning orientation

has been represented in green. We can see that the

learning on this block forgets the relevant orientation

towards the left leading to warnings when a car comes

from the right. The second reason occurs when the

low level (optical ﬂow) provides noisy features. Here,

the aperture problem during optical ﬂow computation

is the main cause of noisy features. This problem is

intrinsic to the way optical ﬂow is computed. This

problem often appears on shadows or beams of head-

lights returning completely wrong displacement di-

rection (e.g. ﬁg.6). To attenuate these unwanted ef-

fects, a correlation of the optical ﬂow with an Harris

score (Harris and Stephens, 1988) has been done in

order to compute the optical ﬂow only on corner fea-

tures. The third reason is the projection of shadows

of objects on the ﬂoor. In those case, the shadow gen-

erate artefacts, moving in the same direction as the

real object, but that could spread over areas with com-

pletely different learnt directions (e.g. ﬁg.7). This last

problem is hard to deal with.

5 PERSPECTIVES AND

CONCLUSIONS

We have proposed a generic system for unexpected

event detection by vision. The latter learns the orien-

tations of motion on a scene and checks those orien-

tations afterward during live real-time analysis. We

have proposed a sparse representation of the distribu-

tion of local optical ﬂow orientation by a linear kernel

based model. The resulting algorithm has been tested

on real sequences and we have shown that the method

is able to classify unexpected events like wrong way

movement, car pulling out in toll line-up or turn-

ing back, etc. Even though the test has been done

over trafﬁc sequences, this algorithm can also be use-

ful for monitoring pedestrians in indoor corridors as

shown on ﬁg.8, where two people are detected go-

ing wrong way since the pedestrians normally walk

counter clock-wise, that is to say upward on the right

side and downward on the left side. Future work

will aim at reducing the wrong detections, often due

to noisy features provided by the optical ﬂow algo-

rithm. Several solutions are under consideration : to

replace the optical ﬂow by other approaches like de-

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

512

Figure 8: Crowd monitoring example. The expected circu-

lation way has been represented with the green arrows.

tection/matching, to add more efﬁcient temporal blur-

ring in order to have event warning instead of image

warning, to make a discussion over the best type of

kernel to use and ﬁnally to compute an adaptative de-

tection threshold C in relation to the learnt distribu-

tion. Moreover, the system only detects anomalies

linked to motion. Expansion to motionless or micro

movement detection is also expected for future in or-

der to enlarge the types of anomalies to detect (person

on the ground, suspect parking ...).

This method has been designed in order to be as

generic as possible with no supervising, as a conse-

quence it may also be used in completely different

ﬁelds than video surveillance. We can imagine to use

this approach for detecting perturbations on satellite

imaging of solar streams for example.

REFERENCES

Andrade, E. L., Blunsden, S., and Fisher, R. B. (2006).

Hidden markov models for optical ﬂow analysis in

crowds. In International Conference on Pattern

Recognition (ICPR), volume 1, pages 460–463.

Barron, J., Fleet, D., Beauchemin, S., and Burkitt, T.

(1992). Performance of optical ﬂow techniques.

Computer Vision and Pattern Recognition (CVPR),

92:236–242.

Black, M. J. and Anandan, P. (1996). The robust estima-

tion of multiple motions: parametric and piecewise-

smooth ﬂow ﬁelds. Computer Vision and Image Un-

derstanding (CVIU), 63(1):75–104.

Blunsden, S., Andrade, E., and Fisher, R. (2007). Non para-

metric classiﬁcation of human interaction. In Pattern

Recognition and Image Analysis (PRIA).

Brostow, G. j. and Cipolla, R. (2006). Unsupervised

bayesian detection of independant motion in crowds.

In Computer Vision and Pattern Recognition (CVPR).

Burges, C. J. C. (1998). A tutorial on support vector

machines for pattern recognition. Data Mining and

Knowledge Discovery, 2:121–167.

Dong, L., Parameswaran, V., Ramesh, V., and Zoghlami, I.

(2007). Fast crowd segmentation using shape index-

ing. In International Conference on Computer Vision

(ICCV).

Duda, R., Hart, P., and Stork, D. (2001). Pattern Classiﬁca-

tion. John Wiley & Sons Inc.

Figueiredo, M. and Jain, A. (2002). Unsupervised learning

of ﬁnite mixture models. Pattern Analysis and Ma-

chine Intelligence, 24(3):381–396.

Freund, Y. and Schapire, R. E. (1996). Experiments with

a new boosting algorithm. In Proceedings of the

30th International Conference on Machine Learning

(ICML), pages 148–156.

Han, B., Comaniciu, D., Zhu, Y., and Davis, L. S. (2008).

Sequential kernel density approximation and its appli-

cation to real-time visual tracking. Pattern Analysis

and Machine Intelligence, 30:1186–1197.

Harris, C. and Stephens, M. (1988). A combined corner

and edge detector. In Alvey Vision Conference, pages

147–151.

Hu, W., Xiao, X., Fu, Z., Xie, D., Tan, T., and Maybank,

S. (2006). A system for learning statistical motion

patterns. IEEE Pattern Analysis And Machine Intelli-

gence, 28:1450–1464.

Junejo, I. N. and Foroosh, H. (2007). Trajectory rectiﬁca-

tion and path modeling for video surveillance. In In-

ternational Conference on Computer Vision (ICCV).

Li, Y. and Ai, H. (2007). Fast detection of independent mo-

tion in crowds guided by supervised learning. In In-

ternational Conference on Image Processing (ICIP).

Proesmans, M., Gool, L. V., Pauwels, E., and Oosterlinck,

A. (1994). Determination of optical ﬂow and its dis-

continuities using non-linear diffusion. In European

Conference on Computer Vision (ECCV).

Rabaud, V. and Belongie, S. (2006). Counting crowded

moving objects. In Computer Vision and Pattern

Recognition (CVPR).

Sidla, O. and Lypetskyy, Y. (2006). Pedestrian detection

and tracking for counting applications in crowded sit-

uations. In Advanced Video and Signal Based Surveil-

lance (AVSS).

Singh, S., Machiraju, R., and Parent, R. (2007). Visual anal-

ysis of trajectory clusters for video surveillance.

Stauffer, C. and Grimson, W. E. L. (2000). Learning pat-

terns of activity using real-time tracking. IEEE Pat-

tern Analysis And Machine Intelligence, 22:747–757.

Tipping, M. E. (2001). Sparse bayesian learning and the rel-

evance vector machine. Journal of Machine Learning

Reasearch 1.

Velastin, S. A., Boghossian, B. A., and Vicencio-Silva,

M. A. (2006). A motion-based image processing

system for detecting potentielly dangerous situations

in underground railway stations. Transportation Re-

search Part C: Emerging Technologies, 14.

Zhao, T. and Nevatia, R. (2004). Tracking multiple humans

in complex situations. IEEE Pattern Analysis And Ma-

chine Intelligence, 26:1208–1221.

AN UNSUPERVISED LEARNING BASED APPROACH FOR UNEXPECTED EVENT DETECTION

513