AUTOMATIC SUSPICIOUS BEHAVIOR DETECTION FROM A

SMALL BOOTSTRAP SET

Kan Ouivirach, Shashi Gharti and Matthew N. Dailey

School of Engineering and Technology, Asian Institute of Technology, Bangkok, Thailand

Keywords:

Hidden Markov Models, Behavior Clustering, Anomaly Detection, Bootstrapping.

Abstract:

We propose and evaluate a new method for automatic identiﬁcation of suspicious behavior in video surveil-

lance data. It partitions the bootstrap set into clusters then assigns new observation sequences to clusters based

on statistical tests of HMM log likelihood scores. In an evaluation on a real-world testbed video surveillance

data set, the method achieves a false alarm rate of 7.4% at a 100% hit rate. It is thus a practical and effective

solution to the problem of inducing scene-speciﬁc statistical models useful for bringing suspicious behavior

to the attention of human security personnel.

1 INTRODUCTION

We focus on enhancing security by performing in-

telligent ﬁltering of typical events and automatically

bringing suspicious events to the attention of human

security personnel.

Pre-trained hidden Markov model (HMMs) and

other dynamic Bayesian networks such as conditional

random ﬁelds (CRFs) have been widely used in this

area (Gao et al., 2004; Sminchisescu et al., 2005).

Some of the existing work relies on having a priori

known behavior classes (Nair and Clark, 2002; Wu

et al., 2005).

More recent work uses unsupervised analysis and

clustering of behaviors (Li et al., 2006; Swears et al.,

2008). Xiang and Gong (2005) model the distribution

of activity data in a scene using a Gaussian mixture

model (GMM) and employ the Bayesian information

criterion (BIC) to select the optimal number of behav-

ior classes prior to HMM training.

We propose to use HMM-based clustering on a

small bootstrap set of sequences labeled as normal or

suspicious. After bootstrapping is complete, we as-

sign new observation sequences to behavior clusters

using statistical tests on the log likelihood of the se-

quence according to the corresponding HMMs. The

cluster-speciﬁc likelihood threshold is learned rather

than set arbitrarily.

In this paper, we brieﬂy describe our method. For

more details, see the full version of this paper.

http://www.cs.ait.ac.th/techreports/AIT-CSIM-TR-

2012-1.pdf.

(a) (b) (c)

Figure 1: Sample foreground extraction and shadow re-

moval results. (a) Original image. (b) Foreground pixels

according to background model. (c) Foreground pixels af-

ter shadow removal.

2 BEHAVIOR MODEL

BOOTSTRAPPING

To build the bootstrap set, over a short period such as

one week, we ﬁrst discard no-motion frames. We then

use the background modeling technique proposed by

Poppe et al. (2007). We use normalized cross cor-

relation (NCC) to eliminate shadows cast by mov-

ing objects. Sample results from the foreground ex-

traction and shadow removal procedures are shown in

Figure 1. We apply morphological opening then clos-

ing operations to obtain the connected components,

then we ﬁlter out any components whose size is below

threshold. We ﬁnally represent each blob (connected

foreground component) i at time t by a feature vec-

tor

containing the blob’s centroid, size, and aspect

ratio, a unit-normalized motion vector for the blob

compared to the previous frame, and the blob’s speed.

Next we map each feature vector

to a discrete cate-

gory (cluster ID) in the set V = {v

,... ,v

}, where

655

Ouivirach K., Gharti S. and N. Dailey M..

AUTOMATIC SUSPICIOUS BEHAVIOR DETECTION FROM A SMALL BOOTSTRAP SET.

DOI: 10.5220/0003727206550658

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2012), pages 655-658

ISBN: 978-989-8565-03-7

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

U is the number of categories, using k-means cluster-

ing. A behavior sequence is ﬁnally represented as a

sequence of cluster IDs.

In blob tracking, we associate blobs in the current

frame with the tracks from the previous frame using

bounding box overlap rule. Blobs corresponding to

isolated moving objects are associated with unique

tracks. When no merge or split occurs, each blob

either matches one of the existing tracks or is clas-

siﬁed as new, in which case a new track is created.

We destroy tracks that are inactive for some num-

ber of frames. We use the color coherence vector

(CCV) (Pass et al., 1996) as an appearance model to

handle cases of blob merges and splits When tracks

merge, we group them, but keep their identities sepa-

rate, and when tracks split, we attempt to associate the

new blobs with the correct tracks or groups of tracks

by comparing their CCVs. This works well on typical

simple cases such as those as shown in Figure 2, but

it can make mistakes with more complex cases.

Frame 72 Frame 90 Frame 94

Frame 103 Frame 110 Frame 116

Figure 2: Sample blob tracking results for typical simple

case.

After blob tracking, we obtain, from a given video,

a set of observation sequences describing the mo-

tion and appearance of every distinguishable moving

object in the scene. We next partition the observa-

tion sequences into clusters of similar behaviors then

model the sequences within each cluster using a sim-

ple linear HMM. We use the method from our pre-

vious work (Ouivirach and Dailey, 2010), which ﬁrst

uses dynamic time warping (DTW) to obtain a dis-

tance matrix for the set of observation sequences then

performs agglomerative hierarchical clustering on the

distance matrix to obtain a dendogram.

To determine where to cut off the dendogram,

we traverse the DTW dendogram in depth-ﬁrst order

from the root and attempt to model the observation

sequences within the corresponding subtree using a

single linear HMM. If, after training, the HMM is un-

able to “explain” (in the sense described below) the

sequences associated with the current subtree, we dis-

card the HMM then recursively attempt to model each

of the current node’s children. Whenever the HMM is

able to explain the observation sequences associated

with the current node’s subtree, we retain the HMM

and prune the tree.

A HMM is said to explain a cluster c if there are

no more than N

sequences in cluster c whose per-

observation log-likelihood is less than a threshold p

To determine the optimal rejection threshold p

for

cluster c, we use an approach similar to that of Oates

et al. (2001). We generate random sequences from the

HMM and then calculate the mean µ

and standard de-

viation σ

of the per-observation log likelihood over

the set of generated sequences. After obtaining the

statistics of the per-observation log likelihood, we let

be µ

− zσ

, where z is an experimentally tuned

parameter.

3 ANOMALY DETECTION

For anomaly detection, we propose a semi-supervised

method that self-calibrates itself from the bootstrap

set. We apply the algorithm of Section 2 to both the

positive and negative sequences in the bootstrap set.

We identify each cluster as a “normal” cluster if all

of the sequences falling into it are labeled as nor-

mal, or identify it as an “abnormal” cluster if any of

the sequences falling into it are labeled as abnormal.

New sequences are classiﬁed as normal if the most

likely HMM for the input sequence is associated with

a cluster of normal sequences and the z-scaled per-

observation log likelihood of the sequence under that

most likely model is greater than a global empirically

determined threshold z.

4 EXPERIMENTAL RESULTS

We recorded video from the scene in front of a build-

ing for one week. We labeled 625 occurrences of

walking into the building, walking out, riding bicy-

cles in, and riding bicycles out as normal and the re-

maining 35 occurrences as suspicious or abnormal.

4.1 Model Conﬁguration Selection

Towards model identiﬁcation, we performed a series

of experiments with different bootstrap parameter set-

tings and selected the conﬁguration with the highest

accuracy in separating the normal sequences from the

abnormal sequences on the bootstrap sequence set, as

measured by the false positive rate for abnormal se-

quences. Every bootstrap cluster containing an ab-

normal sequence is considered abnormal, so we al-

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

656

Table 1: Anomaly detection results for the proposed

method, k-NN, PCA, and SVM in Experiment I, II, III,

and IV, respectively. For PCA, we include 11 abnormal se-

quences from the bootstrap set in the test set, so the total

number of positives is 35.

Method TP FP TN FN TPR FPR

Ours 24 36 450 0 1 0.074

1-NN 19 1 485 5 0.792 0.002

PCA 35 421 65 0 1 0.87

SVM 24 228 258 0 1 0.469

ways obtain 100% detection on the bootstrap set; the

only discriminating factor is the false positive rate. To

ﬁnd the distribution (parameters µ

and σ

) of the per-

observation log likelihood for a particular HMM, we

always generated 1000 sequences of 150 observations

then used a z-threshold of 2.0. We ﬁxed the parame-

ter N

(the number of deviant patterns allowed in a

cluster) to 10.

Based on the false positive rate criterion described

above, we selected the model conﬁguration consisting

of HMMs with ﬁve states and seven tokens trained on

150 bootstrap sequences.

4.2 Anomaly Detection

Here we describe four experiments to evaluate our

anomaly detection method. For each method, we use

the 150-sequence bootstrap sequence set from Sec-

tion 4.1 as training data and test on the remaining 510

sequences. In every experiment, we calculate an ROC

curve and select the detection threshold yielding the

best false positive rate at a 100% hit rate. Our ex-

perimental hypothesis was that the proposed method

for modeling scene-speciﬁc behavior patterns should

obtain better false positive rates than the traditional

methods.

In Experiment I, the red line in Figure 3 repre-

sents the ROC curve for our method as we vary the

likelihood threshold at which a sequence is consid-

ered anomalous. Note that the ROC does not inter-

sect the point (0, 0) because any sequence that is most

likely under one of the HMMs modeling anomalous

sequences in the bootstrap set is automatically clas-

siﬁed as anomalous regardless of the threshold. Ta-

ble 1 shows the detailed performance of this model,

and Figure 4 shows an example of a sequence classi-

ﬁed as abnormal.

In Experiment II, we applied k-nearest neighbors

(k-NN) using the same division of sequences into

training and testing as in Experiment I. As the dis-

tance measure, we used the same DTW measure we

used for hierarchical clustering of the bootstrap pat-

terns in our method. We varied k from 1 to 5. The best

result for k-NN is shown in Table 1. While the false

Figure 3: Anomaly detection ROC curves. Red, green and

blue lines represent ROCs for the proposed method in Ex-

periment I, PCA-based anomaly detection in Experiment

III, and SVM-based anomaly detection in Experiment IV,

respectively.

positive rates are much lower than those obtained in

our method, the hit rates are unacceptable.

In Experiment III, we classiﬁed sequences as nor-

mal or anomalous using a Gaussian density estimator

derived from principal components analysis (PCA).

We calculated, for each sequence in the testbed data

set, a summary vector consisting of the means and

standard deviations of each observation vector ele-

ment over the entire sequence. With seven features

in the observation vector, we obtained a 14-element

vector summarizing each the sequence. After fea-

ture summarization, we normalized each component

of the summary vector by z-scaling. Then, since we

are performing probability density estimation for the

normal patterns, we applied PCA to the 139 normal

sequences in the bootstrap set. We chose the num-

ber of principal components accounting for 80% of

the variance in the bootstrap data. Finally, we classi-

ﬁed the remaining 521 test sequences using the PCA

model to calculate the Mahalanobis distance of each

sequence’s summary vector to the mean of the normal

bootstrap patterns’ summary vectors. The green line

in Figure 3 is the ROC curve obtained by varying the

Mahalanobis distance threshold, and Table 1 shows

detailed results for anomaly detection at a 100% hit

rate. The high false positive rate at this threshold

and the overall poor performance in the ROC anal-

ysis show that PCA is clearly inferior to our proposed

method.

In Experiment IV, here we used the same sum-

mary vector technique used in Experiment III but per-

formed supervised classiﬁcation using support vector

machines. We used the radial basis function kernel

implementation in LIBSVM (Chang and Lin, 2001)

with grid search for the optimal hyperparameters us-

ing ﬁve-fold cross validation on the training set (150

sequences). The blue line in Figure 3 is the ROC

curve obtained by varying the threshold on the signed

AUTOMATIC SUSPICIOUS BEHAVIOR DETECTION FROM A SMALL BOOTSTRAP SET

657

Frame 187 Frame 208 Frame 225 Frame 249 Frame 267

Figure 4: Example anomaly detected by the proposed method in Experiment I. The sequence contains a person walking

around looking for an unlocked bicycle.

distance to the separating hyperplane used for clas-

siﬁcation as normal or abnormal. Table 1 shows

the detailed anomaly detection results for SVMs at a

100% hit rate. Although the results are clearly better

than those obtained from k-NN or PCA, they are also

clearly inferior to those obtained in Experiment I.

5 DISCUSSION AND

CONCLUSIONS

We have proposed and evaluated a new method for

bootstrapping scene-speciﬁc anomalous human be-

havior detection systems. It requires minimal involve-

ment of a human operator; the only required action is

to label the patterns in a small bootstrap set as nor-

mal or anomalous. With a bootstrap set of 150 se-

quences, the method achieves a false positive rate of

merely 7.4% at a hit rate of 100%. The experiments

demonstrate that with a collection of simple HMMs,

it is possible to learn a complex set of varied behav-

iors occurring in a speciﬁc scene. Deploying our sys-

tem on a large video sensor network would potentially

lead to substantial increases in the productivity of hu-

man monitors.

The main limitation of our current method is that

the blob tracking process is not robust for complex

events involving multiple people. The method also

does not allow evolution of the learned bootstrap

model over time. In future work, we plan to address

these limitations.

ACKNOWLEDGEMENTS

This work was partly supported by a grant from the

Royal Thai Government to MND and by graduate fel-

lowships from the Royal Thai Government to KO. We

thank the AIT Computer Vision Group for valuable

discussions and comments on this work.

REFERENCES

Chang, C.-C. and Lin, C.-J. (2001). LIBSVM: a library for

support vector machines. http://www.csie.ntu.edu.tw/

∼cjlin/libsvm.

Gao, J., Hauptmann, A. G., Bharucha, A., and Wactlar,

H. D. (2004). Dining activity analysis using a hid-

den Markov model. In International Conference on

Pattern Recognition (ICPR), pages 915–918.

Li, H., Hu, Z., Wu, Y., and Wu, F. (2006). Behavior mod-

eling and recognition based on space-time image fea-

tures. In International Conference on Pattern Recog-

nition (ICPR), pages 243–246.

Nair, V. and Clark, J. (2002). Automated visual surveillance

using hidden Markov models. In Vision Interface Con-

ference, pages 88–92.

Oates, T., Firoiu, L., and Cohen, P. R. (2001). Using dy-

namic time warping to bootstrap HMM-based cluster-

ing of time series. In Sequence Learning: Paradigms,

Algorithms, and Applications, pages 35–52.

Ouivirach, K. and Dailey, M. N. (2010). Clustering hu-

man behaviors with dynamic time warping and hid-

den Markov models for a video surveillance system.

In ECTI-CON, pages 884–888.

Pass, G., Zabih, R., and Miller, J. (1996). Comparing im-

ages using color coherence vectors. In International

Conference on Multimedia, pages 65–73.

Poppe, C., Martens, G., Lambert, P., and de Walle, R. V.

(2007). Improved background mixture models for

video surveillance applications. In Asian Conference

on Computer Vision (ACCV), pages 251–260.

Sminchisescu, C., Kanaujia, A., Li, Z., and Metaxas, D.

(2005). Conditional random ﬁelds for contextual hu-

man motion recognition. In International Conference

on Computer Vision (ICCV), pages 1808–1815.

Swears, E., Hoogs, A., and Perera, A. (2008). Learning mo-

tion patterns in surveillance video using HMM clus-

tering. In IEEE Workshop on Motion and video Com-

puting (WMVC), pages 1–8.

Wu, X., Ou, Y., Qian, H., and Xu, Y. (2005). A detection

system for human abnormal behavior. In International

Conference on Intelligent Robots and Systems (IROS),

pages 1204–1208.

Xiang, T. and Gong, S. (2005). Video behaviour pro-

ﬁling and abnormality detection without manual la-

belling. In International Conference on Computer Vi-

sion (ICCV), pages 1238–1245.

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

658