Spatio-temporal Center-symmetric Local Derivative Patterns for Objects
Detection in Video Surveillance
Marwa Jmal
1,2
, Wided Souidene
1
and Rabah Attia
1
1
SERCOM, Ecole Polytechnique de Tunisie, Universit
´
e de Carthage, B.P.743, 2078., La Marsa, Tunisia
2
Telnet Innovation Labs, Telnet Holding, Ariana, Tunisia
Keywords:
Local Derivative Patterns, Spatio-temporal Features, Background Modeling, Background Subtraction.
Abstract:
Nowadays, more attention is being focused on background subtraction methods regarding their importance in
many computer vision applications. Most of the proposed approaches are classified as pixel-based due to their
low complexity and processing speed. Other methods are considered as spatiotemporal-based as they consider
the surroundings of each analyzed pixel. In this context, we propose a new texture descriptor that is suitable
for this task. We benefit from the advantages of local binary patterns variants to introduce a novel spatio-
temporal center-symmetric local derivative patterns (STCS-LDP). Several improvements and restrictions are
set in the neighboring pixels comparison level, to make the descriptor less sensitive to noise while maintaining
robustness to illumination changes. We also present a simple background subtraction algorithm which is based
on our STCS-LDP descriptor. Experiments on multiple video sequences proved that our method is efficient
and produces comparable results to the state of the art.
1 INTRODUCTION
Ensuring humans’ security, either in public or pri-
vate spaces, is becoming a major priority for all na-
tions. This issue arises the need for video surveil-
lance systems which consist basically in objects de-
tection, objects tracking and behavior understanding
(Jain and Favorskaya, 2015). The most important task
in surveillance systems is moving objects detection.
Thus, a robust detection will highly increase the ef-
fectiveness of the surveillance.
For many decades, a significant amount of research
in the computer vision field has been devoted to the
task of objects detection. The most straightforward
technique employed in this context, is background
subtraction. In its simplest form, it aims to extract
the foreground which represents the relevant objects
that remain always in motion. Even though it seems
to be simple, this technique has to cope with differ-
ent challenges occurring from dynamic backgrounds
(waving trees, water fountains), illumination varia-
tions, camera jitter as well as other challenges that
are well depicted in (Bouwmans et al., 2010). To deal
with these situations, several works (Sobral and Va-
cavant, 2014; Shahbaz et al., 2015; Benezeth et al.,
2010) have been achieved. Although, some of them
focused on videos captured by freely moving camera
(Megrhi et al., 2015; Sheikh et al., 2009), the major-
ity of models conceived background subtraction ap-
proaches for videos captured by static cameras.
In order to detect sudden events, real-time process-
ing is a requirement in video surveillance systems.
This is behind the fact that most employed methods
in this field are based on independent pixel-level mod-
els which are then integrated in a global background
model.
Color-based methods consist in dynamically compar-
ing pixel colors at different positions against a thresh-
old. These methods are very sensitive to illumination
changes in the scene. Some of them will not be de-
tected as they involve groups of pixels in which some
independent pixels may preserve an appearance sim-
ilar to the background. The remedy to this issue is
to formulate the problem in the feature space: instead
of employing pixels colors for comparison, features
in the current frame are compared with features in
the background model. Lately, Local Binary Pattern
(LBP) (Ojala et al., 2002) was adapted to the task
of background subtraction. It describes a pixel by a
series of bits basing on the gray intensity levels of
its surrounding neighbors. LBP was first employed
by (Heikkil
¨
a and Pietik
¨
ainen, 2006) in this context.
It was proven that this descriptor is simple, invariant
to illumination and computationally effective. More-
Jmal, M., Souidene, W. and Attia, R.
Spatio-temporal Center-symmetric Local Derivative Patterns for Objects Detection in Video Surveillance.
DOI: 10.5220/0005787702150220
In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016) - Volume 3: VISAPP, pages 217-222
ISBN: 978-989-758-175-5
Copyright
c
2016 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
217
over, representing LBP basing on histograms, makes
it invariant to translations. Since then, many attempts
to build robust descriptor were proposed. However,
most of them are either computationally expensive or
result in long histograms. In (Silva et al., 2015) a
comparison of these methods is provided.
In this paper, we propose a new feature descrip-
tor named STCS-LDP (Spatio-Temporal Center Sym-
metric Local Derivative Patterns) which is an opti-
mized and enhanced version of an LBP variant pro-
posed in (Xue et al., 2011). Our main improvements
lies in the neighboring pixels comparison level. To
validate STCS-LDP, we integrated it in a simple back-
ground subtraction process. Experimental results,
carried out on a subset of the CDNet dataset (Wang
et al., 2014), showed that the proposed descriptor is
robust to illumination changes and produces a short
descriptor. The remainder of the paper is organized as
follows: Section 2 presents the related works to pixel-
and LBP-based methods for background subtraction.
Section 3 provides a description of the proposed de-
scriptor. Experimental results are depicted in Section
4 while Section 5 draws some conclusions and per-
spectives.
2 LITERATURE REVIEW
In general, pixel-based background subtraction meth-
ods are simple and robust in several scenarios. Their
major drawback is the sensitivity to illumination
changes. In order to handle dynamic backgrounds,
more than one pixel value should be associated to the
background pixel model. In this context, parametric
and non-parametric background models like Gaussian
Mixture Model (GMM)(Stauffer and Grimson, 1999)
and Kernel Density Estimation (KDE)(Elgammal
et al., 2000) were proposed. They are well-known
methods on which countless variations and im-
provements are made, such as in (Zivkovic, 2004).
Other pixel-based methods, like VIBE (Barnich and
Van Droogenbroeck, 2011), PBAS (Hofmann et al.,
2012) and SuBSENCE (St-Charles et al., 2015), fo-
cused on selecting background samples randomly and
diffusion labelling instead of building a probability
distribution of the background of a pixel.
Ordinary pixel-based methods are based only on the
use of temporal correlation between pixel values
while ignoring the spatial relationship between them
where an important amount of information may be
lost. Subsequently, some methods attempted to for-
mulate the problem in feature space. Heikkila et
al (Heikkil
¨
a and Pietik
¨
ainen, 2006) are the first to
adapt these features for dynamic background mod-
elling. However, the produced LBP operator is long
since it considers the first-order gradient information
between pixel and its neighbors. Center-Symmetric
local binary Pattern (CS-LBP) (Heikkil
¨
a et al., 2009)
is an extension for LBP where only the relation be-
tween center symmetric neighbor pairs is considered.
Although, it produces a shorter feature descriptor, it
does not carry enough information for background
modelling as it ignores the value of the center pixel.
A Local Binary Similarity Patterns (LBSP) descriptor
was proposed in (Bilodeau et al., 2013). Contrarily
to histogram based patterns, this descriptor is based
on absolute differences and is calculated within one
image and between two images. As a consequence,
LBSP succeeded to capture both texture and inten-
sity changes. In (Xue et al., 2011), the authors ap-
ply high-order local derivative pattern to produce a
center-symmetric local derivative pattern descriptor
to capture more local information. This descriptor is
then concatenated with CS-LBP to produce a shorter
descriptor with low complexity and robust foreground
detection. The disadvantage of this method, along
with texture based methods, is that it detects only
changes in texture while neglecting intensity values
which could bring useful information. Also, even
though, it is a concatenation of two short descriptors,
it is really time consuming. To solve the drawbacks of
both LBP and the descriptor presented in (Xue et al.,
2011), we propose a new feature descriptor that is bi-
nary and captures both changes in texture and inten-
sity.
3 METHODOLOGY
3.1 STCS-LDP
Binary feature descriptors are employed in back-
ground subtraction methods thanks to their speed, dis-
crimination, low complexity and invariance to illu-
mination. However, since LBP produce long feature
vectors and CS-LBP ignores the central pixel infor-
mation, Xue et al. (Xue et al., 2011) proposed the use
of local derivative patterns which are able to capture
more information in center-symmetric direction with-
out discarding the information brought by the central
pixel.
Figure 1 presents the diagrams of the three de-
scriptors (LBP, CS-LBP and CS-LDP) with eight
neighbors around the center i
c
. LBP encodes in
all eight direction to produce 8 bits binary sequence
while CS-LBP and CS-LDP pattern encode in four
directions and produce 4 bits sequence. The CS-LDP
descriptor at time t is computed as follows:
VISAPP 2016 - International Conference on Computer Vision Theory and Applications
218
Figure 1: Example of LBP, CS-LBP and CS-LDP features
over eight neighbors.
CSLDP
t
R,N
(x
t,c
,y
t,c
) =
(N/2)1
p=0
f [(i
t,p
i
t,c
) × (i
t,c
i
t,p+N/2
)]2
p
(1)
where i
c
corresponds to the value of central pixel
(x
c
,y
c
); i
p
and i
p+N/2
are the values of neighborhood
pixels in center symmetric direction of N equally
spaced pixels on a circle R. The threshold function
f (.,.) is used to determine the type of local pattern
transition and is defined as:
f (x
1
× x
2
) =
1 i f (x
1
× x
2
) 0
0 otherwise
(2)
However, because CS-LDP is computed based on
comparisons with a center pixel i
c
, change will not
be detected if the intensity of i
c
remains greater (or
smaller) than all neighbors i
p
after a change in a
scene. The solution here, is to employ a parameter T
d
as a threshold when computing the descriptor. This
parameter accounts for noise affecting i
c
(Eq 3).
f (x
1
× x
2
) =
1 i f (x
1
× x
2
) T
d
0 otherwise
(3)
Even though local features are proved to be very dis-
criminative, it is not guaranteed that they perform well
when applied to the task of background subtraction
where features are computed in every position in the
image. The solution to this problem is to compute
features (i) within an image to account for spatial in-
formation (Spatial CS-LDP) by selecting the center
pixel to be in the same region as the neighboring pix-
els, and (ii) between two images to account for tempo-
ral information (Temporal CS-LDP) by selecting the
center pixel to be in another region in the same image
or in another one. The region in which the descrip-
tor is computed should be small in order to capture
more discriminative information. Moreover, authors
in (Bilodeau et al., 2013) pointed the fact that features
should not be based strictly on comparisons in or-
der to handle the situation of large intensity changes.
Thus, using absolute difference allows detecting large
changes in intensity toward larger or smaller values.
Finally, replacing the threshold T
d
with a value that
is relative to the center pixel will improve the speci-
ficity of the descriptor in high illumination variation
situations. In summary, the threshold function of the
STCS-LDP becomes:
f (x
1
× x
2
) =
1 i f |x
1
× x
2
| T
d
× i
c
0 otherwise
(4)
3.2 Background Subtraction with
STCS-LDP
The goal of this work, is to prove that local features
may produce better results in background subtraction
than methods based only on intensity. To study the
benefits of our STCS-LDP, we propose a simple back-
ground subtraction method that focuses mainly on the
performance of the descriptor. Our method has no
update process. Each new frame is compared to a
background model constructed from F first frames.
Within the first F frames, we compute the feature de-
scriptor for each pixel using spatial CS-LDP (i
c
and i
p
are selected from the same image) and an histogram
is produced. A pixel is labelled as background, if its
histogram is repeated for at least B consecutive times.
The repetition is measured by the degree of similarity
between two consecutive histograms while the simi-
larity is measured in terms of histogram intersection
measure defined as:
H(x
1
,x
2
) =
i
min(x
1,i
,x
2,i
) (5)
where x
1
, x
2
are two normalized histograms and i is
the bin index of the histogram. It is also possible
to employ other distance measures such Chi-squared.
This measure its chosen regarding to its simplicity and
robustness as it explicitly discards features occurring
only once in one of the histograms. A user settable
threshold T
desc
is used to be compared with the simi-
larity value. The produced background model is an ar-
ray of Spatial CS-LDP histograms that once created,
it remains unchanged during the whole process.
In the foreground detection phase, the new frame is
represented with Temporal CS-LDP (the center is se-
lected from the current frame and the neighbors from
the background model). In this level, we propose an-
other improvement to the local features. In fact, in
some situations, the Spatial CS-LDP may not per-
form well in noisy regions due to the fact that the
center pixel of the foreground have the same value as
the center pixel in the background model. To correct
this problem and reduce the false negatives, a com-
parison of intensity values is also performed. In or-
der to keep computations simple, we use the L1 dis-
tance measure for intensities comparison. when deal-
ing with color images, per-channel comparisons are
performed. The whole method is depicted in Algo-
rithm 1. Note that int
(x,y,ch)
and desc
(x,y,ch)
are respec-
tively the color intensity of channel ch and the Spatial
Spatio-temporal Center-symmetric Local Derivative Patterns for Objects Detection in Video Surveillance
219
CS-LDP descriptor of the background model at posi-
tion (x,y), histDist refers to the intersection measure
and TCSLDP
(x,y,ch)
is the Temporal CS-LDP descrip-
tor at position (x,y).
Algorithm 1: Background Subtraction with STCS-LDP in
videos.
Require: Image frame Set
Ensure: Labelled frames
Create background model;
TotIntDist 0 ;
TotDesDist 0 ;
for x 0 : numCols do
for y 0 : numRows do
for ch 1 : numChannels do
intDist L1(int
(x,y,ch)
,i
(x,y,ch)
);
desDist
histDist(desc
(x,y,ch)
,TCSLDP
(x,y,ch)
);
TotIntDist TotIntDist + intDist;
TotDesDist TotDesDist + desDist;
end for
if (TotIntDist T
int
& TotDesDist T
desc
)
then
p(x,y) is foreground;
else
p(x,y) is background;
end if
end for
end for
4 PERFORMANCE EVALUATION
We evaluate the use of STCS-LDP in background sub-
traction by means of the CDnet dataset (Wang et al.,
2014). Since our method does not include any update
process for the background model, we have tested
our background subtraction only on the baseline and
thermal video subsets (9 videos, 27149 frames). We
have used exactly the same metrics provided in (Wang
et al., 2014). Let TP the number of true positives, TN
the number of true negatives, FN the number of false
negatives, and FP the number of false positives. The
7 metrics used are:
1. Recall (Re): T P/(T P + FN)
2. Specificity (Sp): T N/(TN + FP)
3. False Positive Rate (FPR): FP/(FP + T N)
4. False Negative Rate (FNR): FN/(T N + FP)
5. Percentage of Wrong Classifications (PWC):
100 × (FN + FP)/(T P + FN + FP + T N)
6. Precision (Pr): T P/(T P + FP)
7. F measure: 2 × Pr × Re/(Pr + Re)
These metrics are provided with eval-
uation tools which are available online
(http://www.changedetection.net). The parame-
ters used for our method are:
T
desc
= 30: threshold used to determine if an input
pixel matches the background model based on the
intersection measure,
T
int
= 90: threshold used to determine if an input
pixel matches the background model based on the
L1 distance,
T
d
= 8: the STCS-LDP descriptor threshold,
F = 100: number of frames considered to build
the background model,
B = 30: required number of similar histograms to
label a pixel as background.
We first investigate the effect of T
d
and T
d
esc on
the subtraction results. Then, we compare the per-
formance of our STCS-LDP background subtraction
technique against some methods from the state of the
art.
4.1 Parameters Analysis
We investigated the effect of the parameters on the
performance of background subtraction. We made the
computation with T
desc
[5,45] and T
d
[1,20]. The
obtained results revealed that T
d
has more effect on
STCS-LDP performance than T
desc
. In fact, when T
d
is low, the descriptor models textures resulted from
small changes in intensity, thus it becomes more sen-
sitive to noise. If T
d
is moderate, textures of small de-
tails disappear. Histograms will be robust to noise, but
at the expense of detailed texture models. However,
if T
d
is high, detailed texture of important changes to-
tally disappear. For T
desc
, if it is low, any change in
texture or intensity will be detected, then the perfor-
mance of the descriptor will depend on the value of
T
d
. If T
desc
is high, only relevant textures of intensity
changes will be detected. Therefore, the value of T
d
is
more critical than T
desc
. T
d
should be set to be higher
than noise.
4.2 Comparison with the State of the
Art
To evaluate the use of our proposed descriptor in
background subtraction, we compared it with some
methods tested on the same dataset (Wang et al.,
2014). We selected some of both best and classical
methods (see Tables 1 and 2). Note that we didn’t
apply any morphological operations as preprocessing
VISAPP 2016 - International Conference on Computer Vision Theory and Applications
220
Table 1: Results on Baseline Dataset (Wang et al., 2014).
Method Re Sp FPR FNR PWC Pr F-measure
SubSENCE(St-Charles et al., 2015) 0.952 0.9982 0.0018 0.0480 0.3574 0.9495 0.9503
STCS-LDP 0.843 0.9985 0.0014 0.156 0.6118 0.9455 0.8915
LBSP(Bilodeau et al., 2013) 0.806 0.9977 0.0023 0.0074 0.9168 0.9275 0.8623
Euclidean(Benezeth et al., 2010) 0.838 0.9955 0.0045 0.1615 1.026 0.872 0.9114
KDE(Elgammal et al., 2000) 0.747 0.9954 0.0046 0.2528 1.8058 0.7392 0.7998
GMM(Stauffer and Grimson, 1999) 0.586 0.9987 0.0013 0.4137 1.9381 0.7119 0.9532
Table 2: Results on Thermal Dataset (Wang et al., 2014).
Method Re Sp FPR FNR PWC Pr F-measure
SubSENCE(St-Charles et al., 2015) 0.8161 0.9908 0.0092 0.1839 2.0125 0.8328 0.8171
STCS-LDP 0.8354 0.9877 0.0123 0.1646 2.3626 0.8463 0.8408
LBSP(Bilodeau et al., 2013) 0.6535 0.9916 0.0083 0.0142 2.0774 0.7794 0.6924
Euclidean(Benezeth et al., 2010) 0.5111 0.9907 0.0093 0.4889 3.8516 0.6313 0.8877
KDE(Elgammal et al., 2000) 0.4147 0.9981 0.0019 0.5853 5.4152 0.4989 0.9164
GMM(Stauffer and Grimson, 1999) 0.3395 0.9993 0.0007 0.6605 4.8419 0.4767 0.9709
of the obtained results. Our method is characterized
by a low FNR, high precision, average PWC, aver-
age recall, and average FRR. It performs better than
the euclidean measure which is based only on pixel
colors. This proves that texture based features are
less sensitive to illumination changes. Compared to
the LBSP, STCS-LDP reached slightly better results
even though they use a larger descriptors (5 × 5)
than ours. This is due to the fact that local deriva-
tive patterns provide a robust descriptor even when
extracted over a small region. Although it combines
detections of both color and texture, the KDE method
was not very successful due to the failure of texture
in uniform regions where only individual pixel col-
ors may detect changes. In our method, we boost
texture comparison between histograms with intensity
comparison between pixel values. Finally, we believe
that STCSLDP performed really well on both datasets
even when compared to SubSENCE, one of the best
ranked methods. Contrarily to our simple method,
SubSENCE is based on pixel-level feedback loops to
dynamically adjust internal parameters without user
intervention.
5 CONCLUSION
In this work, we propose a new texture descriptor for
the purposes of background subtraction. The pro-
posed STCS-LDP may be adapted to slightly dynamic
situations as it can be computed spatially (in the same
frame) or temporally (between two frames). we pro-
posed some improvements in the neighboring pixels
comparison level to make the descriptor less sensi-
tive to noise while maintaining robustness to illumina-
tion changes. Moreover, we compared our descriptor
against some state of the art algorithms and showed
that it achieved comparable results. Future work will
be to bring more enhancements on the descriptor and
integrate it in more sophisticated background subtrac-
tor that is based on an update process to deal with
more complex situations.
ACKNOWLEDGEMENTS
This research and innovation work is carried out
within a MOBIDOC thesis funded by the European
Union under the PASRI project and administered by
the ANPR.
REFERENCES
Barnich, O. and Van Droogenbroeck, M. (2011). Vibe: A
universal background subtraction algorithm for video
sequences. Image Processing, IEEE Transactions on,
20(6):1709–1724.
Benezeth, Y., Jodoin, P.-M., Emile, B., Laurent, H., and
Rosenberger, C. (2010). Comparative study of back-
ground subtraction algorithms. Journal of Electronic
Imaging, 19(3):033003–033003.
Bilodeau, G.-A., Jodoin, J.-P., and Saunier, N. (2013).
Change detection in feature space using local binary
similarity patterns. In Computer and Robot Vision
(CRV), 2013 International Conference on, pages 106–
112. IEEE.
Bouwmans, T., El Baf, F., Vachon, B., et al. (2010). Statisti-
cal background modeling for foreground detection: A
survey. Handbook of Pattern Recognition and Com-
puter Vision, 4(2):181–189.
Spatio-temporal Center-symmetric Local Derivative Patterns for Objects Detection in Video Surveillance
221
Elgammal, A., Harwood, D., and Davis, L. (2000).
Non-parametric model for background subtraction.
In Computer VisionECCV 2000, pages 751–767.
Springer.
Heikkil
¨
a, M. and Pietik
¨
ainen, M. (2006). A texture-based
method for modeling the background and detecting
moving objects. Pattern Analysis and Machine Intel-
ligence, IEEE Transactions on, 28(4):657–662.
Heikkil
¨
a, M., Pietik
¨
ainen, M., and Schmid, C. (2009). De-
scription of interest regions with local binary patterns.
Pattern recognition, 42(3):425–436.
Hofmann, M., Tiefenbacher, P., and Rigoll, G. (2012).
Background segmentation with feedback: The pixel-
based adaptive segmenter. In Computer Vision and
Pattern Recognition Workshops (CVPRW), 2012 IEEE
Computer Society Conference on, pages 38–43. IEEE.
Jain, L. C. and Favorskaya, M. N. (2015). Practical matters
in computer vision. In Computer Vision in Control
Systems-2, pages 1–10. Springer.
Megrhi, S., Jmal, M., Beghdadi, A., and Mseddi, W. (2015).
Spatio-temporal action localization for human action
recognition in large dataset. In IS&T/SPIE Electronic
Imaging, pages 94070O–94070O. International Soci-
ety for Optics and Photonics.
Ojala, T., Pietik
¨
ainen, M., and M
¨
aenp
¨
a
¨
a, T. (2002). Mul-
tiresolution gray-scale and rotation invariant texture
classification with local binary patterns. Pattern Anal-
ysis and Machine Intelligence, IEEE Transactions on,
24(7):971–987.
Shahbaz, A., Hariyono, J., and Jo, K.-H. (2015). Evalu-
ation of background subtraction algorithms for video
surveillance. In Frontiers of Computer Vision (FCV),
2015 21st Korea-Japan Joint Workshop on, pages 1–4.
IEEE.
Sheikh, Y., Javed, O., and Kanade, T. (2009). Background
subtraction for freely moving cameras. In Computer
Vision, 2009 IEEE 12th International Conference on,
pages 1219–1225. IEEE.
Silva, C., Bouwmans, T., and Fr
´
elicot, C. (2015). An
extended center-symmetric local binary pattern for
background modeling and subtraction in videos.
In International Joint Conference on Computer Vi-
sion,(VISAPP).
Sobral, A. and Vacavant, A. (2014). A comprehensive re-
view of background subtraction algorithms evaluated
with synthetic and real videos. Computer Vision and
Image Understanding, 122:4–21.
St-Charles, P.-L., Bilodeau, G.-A., and Bergevin, R. (2015).
Subsense: A universal change detection method with
local adaptive sensitivity. Image Processing, IEEE
Transactions on, 24(1):359–373.
Stauffer, C. and Grimson, W. E. L. (1999). Adaptive
background mixture models for real-time tracking.
In Computer Vision and Pattern Recognition, 1999.
IEEE Computer Society Conference on., volume 2.
IEEE.
Wang, Y., Jodoin, P.-M., Porikli, F., Konrad, J., Benezeth,
Y., and Ishwar, P. (2014). Cdnet 2014: An expanded
change detection benchmark dataset. In Computer Vi-
sion and Pattern Recognition Workshops (CVPRW),
2014 IEEE Conference on, pages 393–400. IEEE.
Xue, G., Song, L., Sun, J., and Wu, M. (2011). Hy-
brid center-symmetric local pattern for dynamic back-
ground subtraction. In Multimedia and Expo (ICME),
2011 IEEE International Conference on, pages 1–6.
IEEE.
Zivkovic, Z. (2004). Improved adaptive gaussian mixture
model for background subtraction. In Pattern Recog-
nition, 2004. ICPR 2004. Proceedings of the 17th In-
ternational Conference on, volume 2, pages 28–31.
IEEE.
VISAPP 2016 - International Conference on Computer Vision Theory and Applications
222