Optimized Cascade of Classiﬁers for People Detection using Covariance

Features

Malik Souded

1,2

and Francois Bremond

STARS Team, INRIA Sophia Antipolis - M´editerran´ee, Sophia Antipolis, France

Digital Barriers France, Sophia Antipolis, France

Keywords:

People Detection, Covariance Descriptor, LogitBoost.

Abstract:

People detection on static images and video sequences is a critical task in many computer vision applications,

like image retrieval and video surveillance. It is also one of most challenging task due to the large number of

possible situations, including variations in people appearance and poses. The proposed approach optimizes an

existing approach based on classiﬁcation on Riemannian manifolds using covariance matrices in a boosting

scheme, making training and detection faster while maintaining equivalent performances. This optimisation

is achieved by clustering negative samples before training, providing a smaller number of cascade levels and

less weak classiﬁers in most levels in comparison with the original approach. Our work was evaluated and

validated on INRIA Person dataset.

1 INTRODUCTION

Person detection is one of the most challenging task

in computer vision. The large variety of people poses

and appearences, added to all external factors like dif-

ferent points of view, scenes content and partial occlu-

sions make this issue complicated. The importance

of this task for many applications like people track-

ing especially in crowded scenes has motivated many

researches. A lot of approaches were proposed as re-

sults of these research.

The most frequent scheme consists in using de-

scriptors to modelize a person. To perform this mod-

elization, an ofﬂine learning step is carried using these

descriptors. Once the learning achieved and the clas-

siﬁer obtained, the detection is performed by testing

all possible image subwindows.

In (Papageorgiou and Poggio, 2000), Papageor-

giou and Poggio used Haar-like features to train a

SVM classiﬁer. Viola et al. (Viola et al., 2006) trained

a cascade of Adaboost classiﬁers using Haar-like fea-

tures too. In (Dalal and Triggs, 2005), a new descrip-

tor called Histogram of Oriented Gradients (HOG)

was introduced by Dalal and Triggs and was used to

train a linear SVM, providing very good people detec-

tor. Dala et al. associate this descriptor later in (Dalal

et al., 2006) with histograms of differential optical

ﬂow features outperforming the previous approach.

Mu et al. have proposed new variants of

well known standard LBP descriptor in (Mu et al.,

2008). They demonstrated the effectiveness of their

Semantic-LBP and Fourier-LBP features in com-

parison to standard LBP for people detection. In

(Schwartz et al., 2009), Schwartz et al. concatained

HOG, color frequency and co-occurence matrices as

one descriptor and employed Partial Least Square

analysis for dimensionality reduction.

All the previous mentioned approaches can be

classiﬁed as dense representation methods due to the

detection method (dense search on images). Some

other approaches use different detection method

and can be categorized as sparse representation ap-

proaches. They consists of modeling the human body

parts, detect them and achieve people detection using

geometric constrains. In (Mikolajczyk et al., 2004),

dedicated Adaboost detectors were trained for several

body parts. The ﬁnal detection is obtained by opti-

mizing the likelihood of part occurence along with the

geometric relation.

Recently, Tuzel et al. (Tuzel et al., 2007) propose

a performant approach for people detection. Their

approach uses covariance descriptors to characterise

people. This characterisation is achieved by training

a cascade of classiﬁers on a dataset containing human

and non-human images. The training is done using

a modiﬁed version of LogitBoost algorithm to deal

with some speciﬁcities and constraints of covariance

descriptors. One of the main covariance descriptor

820

Souded M. and Bremond F..

Optimized Cascade of Classiﬁers for People Detection using Covariance Features.

DOI: 10.5220/0004304208200826

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2013), pages 820-826

ISBN: 978-989-8565-47-1

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

issues is the computing time of all related operators

and metrics. It makes covariance descriptors difﬁcult

to use for real-time processing.

Yao and Odobez (Yao and Odobez, 2008) have

proposed three main contributions to this approach

improving the efﬁciency of the training and detection

stage, and providing better performances.

Because our approach is mainly based on these

two last approaches, a brief recall about covariance

descriptor computation, and a summary of the two

cited approaches are described below.

The two approaches present some issues. In the

next section, the main contribution of this paper,

addressing these issues, is presented. Finally, the

experimental results demonstrate that the proposed

approach provides equivalent performances that the

original ones while improving the processing time of

the training and detection stage.

2 PEOPLE DETECTION USING

COVARIANCE FEATURES

2.1 Region Covariance Descriptor

Region covariance descriptor is a powerful way to en-

code a large amount of information inside in a given

image region. Il allows the encapsulation of a large

range of different features in a single structure, repre-

senting the variances of each feature and the correla-

tion between features.

Let I be an image of dimension W ×H. We can

extract at each pixel location x = (x, y)

a set of d

features such as intensity, color, gradients, ﬁlter re-

sponses, etc.

For a given rectangular region R of I, let {z

}

i..S

the d-dimensional feature points inside R. The region

R is represented with the d ×d covariance matrix of

the feature points

S−1

∑

i=1

−µ)(z

−µ)

(1)

where µ is the mean of the points z

and S the number

of pixels within R

2.1.1 Used Features

In (Tuzel et al., 2007), Tuzel et al. use a 8-

dimensional set of features



x y |I

| | I

+ I

| |I

| arctan



(2)

where x and y are the pixel location, I

;... are in-

tensity derivatives, and the last term is the gradient

orientation.

Yao and Odobez (Yao and Odobez, 2008) replace

the two second derivatives features |I

| and |I

| by

two foreground measures G and

+ G

. G

denotes the foreground probability value (a real num-

ber between 0 and 1 indicating the probability that the

pixel x belongs to the foreground), and G

and G

are the corresponding ﬁrst order derivatives. These

foreground features are obtained using a background

subtraction technique which is restricted to moving

people. These two features improve people detec-

tion performances and processing time in video se-

quences.

2.1.2 Fast Covariance Descriptor Computation

A large number of covariance descriptors are required

to achieve the training of classiﬁer cascade and for an

effective process. The computation of all the feature

sums, means and variances for each region has a high

cost in term of processing time. To deal with this, In-

tegral images are ideally suited to minimize the num-

ber of numerical operations.

Integral images are intermediate image represen-

tations used for the fast calculation of region sums

(Simard et al., 1998), (Viola and Jones, 2001). Each

pixel of the integral image is the sum of all the pixels

inside the rectangle bounded by the upper left corner

of the image and the pixel of interest.

Due to the symmetic nature of covariance matri-

ces, only upper (or lower) triangle values are needed.

In the case of 8-feature set, the covariance matrix will

contain 36 different values, and 44 Integral Images

are computed to speed up the computing process (8

integral images for the representation of each feature

independently and 36 for the representation of prod-

uct for each pair of features).

2.1.3 Covariance Normalisation

In order to enhance covariance descriptors robustness

toward local illumination changes, a normalization

step is performed on the covariance matrix. Let r be a

subregion contained in a larger region of interest R.

First, both covariance matrices C

and C

are

computed using integral representation. After that,

the values of covariance matrix C

are normalized

with respect to the standard deviations of their cor-

responding. features inside the detection window R

as in (Tuzel et al., 2007) The normalized covariance

descriptor is denoted

OptimizedCascadeofClassifiersforPeopleDetectionusingCovarianceFeatures

821

2.1.4 LogitBoost Algorithmon Riemannian

Manifolds

The classiﬁcation process is performed using a

cascade of classiﬁers which is trained using a

LogitBoost algorithm on Riemannian Manifolds.

Standard LogitBoost Algorithm on Vector Spaces.

As seen in (Friedman et al., 2000), let {(x

, y

)}

i=1...N

be the set of training samples, with y

∈ {0, 1} and

∈ R

. The goal is to ﬁnd a decision function F

which divides the input space into the 2 classes. In

LogitBoost, this function is deﬁned as a sum of weak

classiﬁers, and the probability of a sample x being in

class 1 (positive) is represented by

p(x) =

F(x)

+ e

−F(x)

, F(x) =

∑

l=1

(x). (3)

The LogitBoostalgorithmiteratively learns the set

of weak classiﬁers {f

}

l=1...N

by minimizing the neg-

ative binomial log-likelihood of the training data:

−

∑

log(p(x

)) + (1−y

)log(1− p(x

))], (4)

through Newton iterations. At each iteration

l, this is achieved by solving a weighted least-

square regression problem:

∑

i=1

) −z

where z

−p(x

)

p(x

)(1−p(x

))

denotes the re-

sponse values, and the sample weights

are given by w

= p(x

)(1 − p(x

)).

LogitBoost Algorithm on Riemannian Manifolds.

To train classiﬁers using covariance descriptors, this

algorithm is not usable as it is. In fact, covariance

descriptors do not belong to vector spaces but to the

Riemannian manifold M of d ×d symmetric positive

deﬁnite matrices Sym

Based on an invariant Riemannian metric on the

tangent space deﬁned in (Pennec et al., 2006), let X

and Y be two matrices from Sym

, the following op-

erators are deﬁned and used to achieve training using

LogitBoost on Riemannian manifold:

exp

(y) = X

exp(X

−

(5)

log

(y) = X

log(X

−

(6)

(X, Y) = trace



log

−

)



(7)

which are respectively the exponential, the loga-

rithm and the squared distance on Sym

matrices.

exp(y) = U exp(D)U

and log(y) = U log(D)U

y = UDU

is the eigenvalue decomposition of the

symmetric matrix y and exp(D) and log(D) are ob-

tained by applying exponential and logarithm func-

tions respectively on the diagonal entries of the diag-

onal matrix D.

Tuzel et al. have introduced a modiﬁcations to the

original LogitBoost algorithm to speciﬁcally account

for the Riemannian geometry. This was done byintro-

ducing a mapping function h projecting the input co-

variance descriptors into the Euclidian tangent space

at a point µ

of the manifold M :

h(X) = vec

(log

(X)) (8)

where: vec

(y) = vec

−

), and vec

(y) =

1,1

√

1,2

√

1,3

... y

1,2

√

2,3

... y

d,d

]

The trained cascade consists of a list of ordered

strong classiﬁers. Each strong classiﬁer contains a set

of weak classiﬁers. A weak classiﬁer is deﬁned by a

sub-region of interest, the corresponding mean value

of covariance descriptors of all positive samples and

a regression function.

To train a level k of the cascade, a given number

of weak classiﬁers are successively added. To add a

weak classiﬁer l to the current training classiﬁer, 200

candidate weak classiﬁers are evaluated: 200 subwin-

dows are randomly selected. Let r

be one of these

subwindows and

the corresponding normalized

covariance descriptor on the sample j. For each sub-

region r

, the mean µ

of all the normalized covariance

descriptors

of the positive samples is computed

using a gradient descent procedure described in (Pen-

nec et al., 2006). Using this mean µ

, all

of all

the samples are projected onto the tangent space using

(8) obtaining vectors in Euclidean space. Using these

vectors and the corresponding weights of all samples,

a regression fucntion g

is computed.

The best weak classiﬁer, which minimizes nega-

tive binomial log-likelihood (4), is added to the cur-

rent training classiﬁer. The weights and the proba-

bilities of all the samples are updated according to

the new added weak classiﬁer. The positive and the

negative samples are sorted in a decreasing order us-

ing their probabilities. The current strong classiﬁer is

concidered as fully trained if the difference between

the probability of the (99.8%)

positive sample and

the (35%)

negative sample is greater than 0.2.

In this case, the training of the current cascade

level is achieved. The negative samples are tested

with the new cascade and all correctly classiﬁed sam-

ples are removed from the training dataset. The next

cascade level is trained using remaining negatives.

VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications

822

Figure 1: Comparison between structures of the cascade of classiﬁers: (a) Tuzel et al.(Tuzel et al., 2007); (b) trained with a

random selection of negative subset; (c) our proposed approach: less cascade levels with less weak classiﬁers per level.

Note that Yao et. al have introduced two impor-

tant improvements. First, classiﬁers are trained on a

lower dimension. They proposed an approach to se-

lect the best subset of d

′

features from the d origi-

nal ones for each sub-region. They train classiﬁers on

4-feature covariance descriptors. Second, they have

concatained the mean feature vector of each random

sub-region to the mapped vector of each sample be-

fore regression computing, improving performances.

2.2 Main Issues

The initial approach proposed by Tuzel et al. (Tuzel

et al., 2007) outperforme existing approaches of the

state of the art by providing a lower rate of miss-

detections and false positives, but it has the disadvan-

tage of being highly time consuming for the detection

process and not applicable for real-time processing.

In (Tuzel et al., 2007), Tuzel et al. indicate that detec-

tion time on a 320 x 240 image is approximatively 3

seconds for a dense scan, with 3 pixel jumpsvertically

and horizontally. Note that training time is relatively

long also (2 days in (Tuzel et al., 2007)).

The most computationnally expensive operation

during the training and the classiﬁcation is eigenvalue

decomposition. This decomposition is the basis of all

operators in Sym

. Eigenvalue decomposition of a

symmetric d ×d matrix requires O(d

) arithmetic op-

erations, so computing time increases quickly by us-

ing more features.

The feature subset selection approach, proposed

by Yao and Odobez in (Yao and Odobez, 2008) al-

lows to work in a lower dimensional symmetric posi-

tive deﬁnite matrices, making eigenvalue decomposi-

tion faster and thereby improving all the training and

classiﬁcation processes.

We focus in our work on another way to make the

classiﬁcation faster while maintaining high classiﬁca-

tion performances. At the end, the obtained approach

improve also the training stage.

3 PROPOSED ALGORITHM

3.1 Ordering Negative Training Sample

Using large number of samples for the training pro-

cess makes it very slow. Of course, the larger the

training dataset is, the more performant the classiﬁer

cascade is. But most of the time, large number of neg-

ative samples contain very similar information. This

problem is more frequent for the ﬁrst cascade levels,

where a new level is trained using false positives of

previous levels. These false positives are generally

resulting from successive small shifts of testing win-

dow on the image, providing very similar content.

One can suppose that using a smaller subset of

randomly selected negative samples to train a given

cascade level can be a good solution to speed up train-

ing. We can suppose that a randomly selected sub-

set can be statistically representative of all remaining

negatives.

We have tested this approach and we have ob-

served that it effectively speeds up training and pro-

vides a lower number of cascade levels than (Tuzel

et al., 2007) but with longer classiﬁers (See Figure. 1,

b), slowing down in comparison to the previous men-

tioned approaches. In fact, one cascade level consists

of a set of weak classiﬁers. The response of one clas-

siﬁer is obtained after computing the output values of

all its weak classiﬁers. It means than a long classiﬁer

containing a large number of weak classiﬁers takes

more time to return a decision, so a cascade of long

classiﬁers is very slow for detection.

The number of weak classiﬁers per cascade level

depends mainly on the diversity of negative samples

used for the training. The characterisation of positives

and their separation from negativesl requires as many

subregions of interest as the samples are diverse.

To illustrate the relationship between negative

samples diversity and classiﬁers cascade structure, let

us use a simple example which can be generalized to

understand the concept. Suppose that we have to sep-

arate a person image from three non-person images:

a sky image, a vertical barrier image and a lamppost

OptimizedCascadeofClassifiersforPeopleDetectionusingCovarianceFeatures

823

93.53 50.04 52.84 66.46

49.36 15.91 13.84 38.50

8.97 8.62 7.42 11.50

4.86 5.45 8.31 19.26

Negative

Sample 1

Negative

Sample 2

Negative

Sample 1

Negative

Sample 3

d²(X, Y) per

sub-region

191.68 23.04 92.84 30.31

49.50 14.80 9.71 53.18

17.71 34.37 94.11 39.31

31.78 26.85 105.85 76.81

d²(X, Y) per

sub-region

d² = 454.87

d² = 891.87

the squared

distance between

the two samples

the squared

distance between

the two samples

Figure 2: Used Squared distance computation between two

negative samples for hierarchical clustering.

image. Due to the poverty of texteures and gradiends

on the sky image, a unique large covariance region

is sufﬁcient to separate the sky image from the per-

son image, which has many gradients and a vertical

shape. For vertical barriers, the previous region is not

appropriate due to the vertical shape of the person. A

smaller region around the person’s head is more ap-

propriate. The circular shape of the head provides a

good separation. Now, for the lamppost, the two pre-

vious regions are not suitable. It is necessary to take

a region on legs or around shoulders to encode cur-

vatures. For this example, there are two methods to

train the classiﬁer. The ﬁrst cascade is trained with

one negative image at a time in the mentionned order.

The second cascade is trained with the three negative

images at the same time, using approriate parameters.

The ﬁrst method provides a cascade with three levels,

each level containing one weak classiﬁer corresopnd-

ing to onecase (low texture level, vertical shapes only,

circular shape at the top of vertical shape). The sec-

ond method provides a cascade of a unique classiﬁer

containing three weak classiﬁers at least (the number

can be larger du to the possible combinations).

Suppose now that we have to perform a people

detection on a large image which contains only sky,

a low textured road, some vertical barriers and some

lampposts. Both cascades will provide equivalent de-

tection performances, but the ﬁrst one will be faster.

This is becausemost of tested windows (sky and road)

are rejected after evaluating only one covariance de-

scriptor (the one of the ﬁrst cascade level), while the

second classiﬁer cascade needs to evaluate three (or

more) covariance descriptors for each tested window.

The average number of evaluated covariance de-

scriptors using Tuzel et al. cascade (a) is 8.45 while

the cascade in (b) needs more than 21 covariance de-

scriptor evaluation.

We propose an approach using a shorter subset

of negatives at each cascade level training to make

it faster. Our approach provides shorter cascade with

smaller classiﬁers on average (Figure. 1, c ) in com-

parison with the Tuzel et al. one (Figure. 1, a) mak-

ing detection process faster. At the same time, the

experimental results show that our approach provides

similar detection performances than the original one.

3.2 Clustering on Negative Data

The idea consists in regrouping negative samples per

groups containing similar contents in terms of co-

variance information, and train each cascade level by

one group of similar samples. The previously de-

scribed Logitboost algorithm achieves characteriza-

tion of people against a group of negative samples

faster when these negative samples are more similar.

It also specialize each cascade level. This negative

samples regrouping is achievd using clustering meth-

ods. We tried two methods to perform this clustering,

the ﬁrst one is applyed directely in Riemannian man-

ifold while the second one is performed in the Eu-

clidean space.

3.2.1 Hierarchical Clustering on Covariance

Descriptors using Covariance Matrices

Distance

A matrix containing the distances between all pairs of

negative samples is computed. To compute the dis-

tance between two negative samples using covariance

descriptors, each sample image is divided into a grid

of 16 equal sub-regions. The ﬁnal distance between

the two samples is the sum of the 16 distances be-

tween each covariance descriptors pair using (7) (See

Figure. 2). Once the distance matrix computed, a hi-

erarchcal clustering is performed, providing a tree of

negative samples (See Figure. 3)

Distance

Sample 1

Sample 2

Sample 3

Sample 4

Sample N-5

Sample N-4

Sample N-3

Sample N-2

Sample N-1

Sample N

Negative samples

Cluster of 35% of

Negative samples

Figure 3: Hierarchical tree of clustered negative samples.

3.2.2 Clustering in Projection Space

The second clustering approach consists in project-

ing all negative samples to a tangent space. In this

method, we use covariance descriptor of the whole

VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications

824

image of each sample. The mean of all negative

samples is computed and used to project all covari-

ance descriptors to the Euclidean space. Finally Eu-

clidean space, the clustering is performed using adap-

tive bandwidth mean shift ﬁltering (Comaniciu and

Meer, 2002). (See Figure. 4)

Largest cluster

log

Figure 4: Clustering on tangent space.

3.3 Train Iteratively by each Subset

After clustering, it is now possible to select the n most

similar negatives samples. In the case of hierarchical

clustering, we select ﬁrst the cluster which contain n

= 35 % of remaining negatives. we took this value ac-

cording to Tuzel et al. parameters (Tuzel et al., 2007),

to have similar conditions and to be able to compare

results. In the mean shift clustering on tangent space,

we select the cluster containing the largest number of

samples. This is motivated by the desire to eliminate

the largest percentage of negative samples as soon as

possible.

The training is now done by applying a cluster-

ing step on the negative samples, selecting the most

similar negatives subset and use it for training. After

achieving current level training, the new cascade is

applyed to all the remaining negative samples, those

used for training and the others. We observed that

80% to 95% of the negatives from the used subset are

correctly classiﬁed and removed and a small part of

unused negatives too.

The clustering is repeated on remaining negatives

to train next levels.

4 EXPERIMENTAL RESULTS

We conduct experiments on INRIA dataset to be able

to compare our results with those of Tuzel et al.

(Tuzel et al., 2007).

The INRIA person dataset (Dalal and Triggs,

2005) is devided to two subsets: a training set con-

taining 2416 person annotationsand 1218 person-free

images and a test dataset with 1132 persons ans 453

person-free images. This dataset is quite challenging

due to the various scenes, content, and persons ap-

pearence and poses.

4.1 Detection Performances

Comparison

The detection performances were evaluated on two

criteria: miss detection rate, given by

FalseNeg

FalseNeg+TruePos

and false positive per window, which is given by

FalsePos

TrueNeg+FalsePos

. The rightmost curve points of our

method corresponds to the results of the 8 ﬁrst levels

of the cascade. The other points are added every 4

cascade levels. The curve in Figure 5 show that our

approach provides very close performances to Tuzel

et al. ones, which outperform Dalal et al. results

(Dalal et al., 2006)

0.01

0.02

0.04

0.06

0.08

0.1

-5

-4

-3

-2

Dalal et al. : Ker SVM

Tuzel et al. : LogitBoost on Sym

Our approach : Clustering+

LogitBoost on Sym

miss rate

false positives per window (FPPW)

Figure 5: Comparison with the methods of Dalal et al.

(Dalal et al., 2006) and Tuzel et al. (Tuzel et al., 2007)

on the INRIA data set.

4.2 Classiﬁer Cascade Structure,

Training Time and Detection Time

Comparison

Our cascade of classiﬁer (See Figure. 1 (c)) is shorter

than the Tuzel et al. one. It contains 18 levels achiev-

ing rejection of more than 99% of negatives during

training. Most levels contain less weak classiﬁers

also. The average number of evaluated covariance de-

scriptors using our cascade is 6.85 while it is 8.45 for

Tuzel et al. cascade

The main consequence of this difference of struc-

tures is the detection time. Our cascade perform de-

tection faster than the Tuzel el al. one. We have im-

plemented both approaches with C++ and performed

training and detection on an Intel(R) Core(TM) i7-

920 Processor at 2.66-Ghz with 4Gbytes or RAM.

The average time of detection on images of 320x240

is approximatively 2.3 seconds for Tuzel et al. while

it is approximatively 0.5 seconds for our method.

OptimizedCascadeofClassifiersforPeopleDetectionusingCovarianceFeatures

825

In the same conditions, training time is also de-

creased by our approach. The training takes 22 hours

for Tuzel et al. approach while it takes 9 hours for our

approach.

Note ﬁnally that the clustering in tangent space

provide better results for ﬁrst cascade levels training,

but after few levels, it becomes less precise. This can

be explained by the fact that at the ﬁrst level,negative

samples are densly regrouped. The computed mean

for tangent space projection is signiﬁcant. After few

level training, and removing correct classiﬁed neg-

atives, the remaining negatives became sparse and

computing a mean on sparse samples make it less sig-

nicative, the projection to tangent space is not suit-

able.

5 CONCLUSIONS

We have proposed an approach to optimize people de-

tection using covariance descriptors. This approach

consists in clustering negative data before training to

obtain better classiﬁer structure. The resulting de-

tector is faster that original one and was trained in

shorter time. The experimental results on a challeng-

ing dataset validate our approach.

REFERENCES

Comaniciu, D. and Meer, P. (2002). Mean shift: A robust

approach toward feature space analysis. IEEE Trans.

Pattern Anal. Machine Intell., 24:603619.

Dalal, N. and Triggs, B. (2005). Histograms of oriented

gradients for human detection. In IEEE Conf. Comp.

Vision and Pattern Recognition (CVPR).

Dalal, N., Triggs, B., and Schmid, C. (2006). Human de-

tection using oriented histograms of ﬂow and appear-

ance. In Europe Conf. Comp. Vision (ECCV), volume

II, pages 428441.

Friedman, J., Hastie, T., and Tibshira, R. (2000). Additive

logistic regression: a statisticalview of boosting. Ann.

Statist., 23(2):337C407.

Mikolajczyk, K., Schmid, C., and Zisserman, A. (2004).

Human detection based on a probabilistic assembly of

robust part detectors. In Europe Conf. Comp. Vision

(ECCV), volume I, pages 6981.

Mu, Y., Yan, Y., Liu, Y., Huang, T., and Zhou, B. (2008).

Discriminative local binary patterns for human detec-

tion in personal album. In CVPR 2008, pages 18.

Papageorgiou, P. and Poggio, T. (2000). A trainable sys-

tem for object detection. Int. J. of Computer Vision,

38(1):1533.

Pennec, X., Fillard, P., and Ayache, N. (2006). A rieman-

nian framework for tensor computing. Int. Journal of

Comp. Vision, 66(1):4166.

Schwartz, W., Kembhavi, A., Harwood, D., and Davis, L.

(2009). Human detection using partial least squares

analysis. In ICCV.

Simard, P., Bottou, L., Haffner, P., and LeCun, Y. (1998).

Boxlets: A fast convolution algorithm for signal pro-

cessing and neural networks. Proc. Conf. Advances

in Neural Information Processing Systems II, pp. 571-

577.

Tuzel, O., Porikli, F., and Meer, P. (2007). Human detection

via classiﬁcation on riemannian manifolds. In IEEE

Conf. Comp. Vision and Pattern Recognition (CVPR).

Viola, P. and Jones, M. (2001). Rapid object detection

using a boosted cascade of simple features. Proc.

IEEE Conf. Computer Vision and Pattern Recognition

(CVPR 01), vol. 1, pp. 511-518.

Viola, P., Jones, M., and Snow, D. (2006). Detecting pedes-

trians using patterns of motion and appearance. In Eu-

rope Conf. Comp. Vision (ECCV), volume II, pages

589600.

Yao, J. and Odobez, J. (2008). Fast human detection from

videos using covariance feature. In: ECCV 2008 Vi-

sual Surveillance Workshop.

VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications

826