BACKGROUND SUBTRACTION USING BELIEF
PROPAGATION
Hee-il Hahn
Dept. Information and Communications Eng., Hankuk University of Foreign Studies, Yongin, Korea
Keywords: Background subtraction, Pixel-based background modelling, Visual surveillance, Markov random fields,
Belief propagation.
Abstract: It is challenging to detect foreground objects when background includes an illumination variation, shadow
or structural variation due to their motion. Basically pixel-based background models suffer from statistical
randomness of each pixel. This paper proposes an algorithm that incorporates Markov random field(MRF)
model into pixel-based background modelling to achieve more accurate foreground detection. Under the
assumptions the distance between the pixel on the input image and the corresponding background model
and the difference between the scene estimates of the spatio-temporally neighboring pixels are exponentially
distributed, a recursive approach for estimating the MRF regularizing parameters is proposed. The proposed
method alternates between estimating the parameters with the intermediate foreground detection results and
detecting the foreground with the estimated parameters, after computing them with the detection results of
the pixel-based background subtraction. Extensive experiment is conducted with several videos recorded
both indoors and outdoors to compare the proposed method with the codebook-based algorithm.
1 INTRODUCTION
Computer vision systems such as visual surveillance,
object tracking need to separate the moving objects
from the scene background. Background subtraction
in the field of view of stationary video camera is a
common approach for detecting foregrounds from
the dynamic backgrounds. Usually background
subtraction employs pixel-based background model.
Its simplest model assumes a pixel can be modelled
with statistical informations such as mean and
variance estimated from the corresponding pixel
location of a sequence of video frames. This method
tries to detect the foreground by thresholding the
intensity or color difference between the current
frame and the background model. However it is very
sensitive to the selection of threshold and rarely
deals with the dynamics of backgrounds, like
illumination variations or the local motion of the
background objects, e.g. waving trees. Their
dynamics causes the pixel intensity values to vary
significantly with time. Many authors proposed
several promising schemes to model such variations.
Among them are the generalized mixture of
Gaussians (Stauffer and Grimsom, 1999),
nonparametric kernel model (Elgammal, et al., 2002),
or codebook model (Kim, et al., 2005), etc. Stauffer
and Grimsom model the pixel intensity with a
mixture of 3 to 5 Gaussian distributions and use the
EM algorithm for adaptation of the mixture model.
Elgammal, et al. estimate the density function of
each pixel nonparametrically using a kernel function.
When the Gaussian kernel function is adopted it can
be viewed as a generalization of the Gaussian
mixture model. Kim, et al. adopt a codebook
quantization scheme to construct a background
model from long observation sequences. Each
background pixel has a codebook composed of
group of codewords. Although a single codeword
may be enough to model static background pixel,
mixed background pixel can be modelled by
multiple codewords whose number depends on the
dynamics of the pixel. All the above approaches are
similar in that they handle the complex backgrounds
by modelling a pixel with multi-modal distributions.
However, Pixel-based algorithms like the above
approaches basically assume the statistics of each
pixel are independent although they are highly
correlated with the neighboring ones. Some
researchers employ the block-based models or
Markov random field techniques to improve the
pixel-based algorithms. MRF-based methods usually
281
Hahn H..
BACKGROUND SUBTRACTION USING BELIEF PROPAGATION.
DOI: 10.5220/0003444102810286
In Proceedings of the 8th International Conference on Informatics in Control, Automation and Robotics (ICINCO-2011), pages 281-286
ISBN: 978-989-8425-75-1
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
exploit the spatial and temporal dependencies of the
pixels by developing MRF models for background
subtraction. MRF assumes that each variate
corresponding to its pixel location is connected to its
four or eight nearest neighbours. MRF needs cost
functions which are related with the compatibility
functions between the scene variable and the
corresponding pixel value. Basically any
background model can be used to define the cost
functions. This paper chooses a codebook-based
background model for the cost functions. Almost all
MRF-based background models select the fixed
values for all MRF parameters. For example,
(Migdal, et al., 2005) assigns the constant energy
potentials for all the spatial, posterior and temporal
cliques and (Wu, et al., 2010) assumes all
compatibility functions are exponentially distributed
with constant parameters. (McHugh, et al., 2009)
models the background subtraction as a binary
hypothesis test and determines the detection
threshold by means of Ising model. (Xu, et al., 2008)
recovers the background image from a sequence of
images containing moving foreground objects. A
loopy belief propagation is employed for
background estimation.
A loopy belief propagation is also adopted in this
paper. However its roles are quite different in that it
decides whether an image pixel belongs to
background or foreground in this paper, while Xu, et
al. use it to indicate from which frame the pixel
should be selected.
This paper makes major contributions that
exploits both the spatial and temporal dependencies
by developing MRF models for background
subtraction and proposes a recursive approach for
estimating the MRF regularizing parameters.
2 MRF-BASED FOREGROUND
DETECTION
Let
{
}
i
X
x= denote a set of binary random variable,
where
i
represents a pixel location. A state space is
assumed, say
{
}
0,1Λ=
, so that
i
x ∈Λ
for all
i
. Let
Ω
be the set of all possible configurations:
()
{
}
12
,,, : ,1
Ni
x
xxx iN
ω
Ω = = ⋅⋅⋅ Λ
(1)
And a set of random variable
X
is assumed to be
a MRF. Then the probability
()
PX
ω
= is a Gibbs
distribution, depicted as:
()
()
1
U
T
PX e
Z
ω
ω
==
(2)
where
is a normalizing constant called the
partition function,
T
is a constant called the
temperature and
(
)
U
ω
is the energy function. The
energy is a sum of clique potentials
()
c
V
ω
over all
possible cliques
c
^
, which is defined as
(
)
(
)
(
)
()
,
,
,
ciiijij
ciij
UV VxVxx
ωω
==+
∑∑
^
(3)
For MRF-based background model, a superscript
is added to the random variable
i
x
so that
i
x
is
replaced with
t
i
x
, where t represents a time index.
The energy function
(
)
U
ω
is extended in the
following way, to include the time dependency as
well as the spatial dependency.
(
)
(
)
() ( ) ( )
1
,,
,,
, ,
c
c
ttt tt
i i ij i j ij i j
iij ij
UV
Vx V xx V xx
ωω
=
=+ +
∑∑
^
(4)
The scene variable
t
i
x
is associated with the pixel
value
t
i
y at time t and pixel location
i
.
That is,
t
i
x
has a value of
0
when its corresponding pixel value
t
i
y comes from the background model and 1
t
i
x
=
in
case of foreground.
There is some statistical dependency between the
pixel value
t
i
y at time t and its corresponding
decision result or scene variable
t
i
x
at each pixel
location
i
. A background pixel must come out from
the background model, and so the potential
(
)
t
ii
Vx
in
(4) measures how the background pixel deviates
from the background model, for the same case with
the foreground pixel. Thus,
(
)
t
ii
Vx
can be defined as:
()
(
)
tt
ii
t
ii
t
i
d y y Background
Vx
y
Foreground
μ
=
Γ∈
(5)
where
is the proportional constant and
Γ
is
the potential associated with the foreground pixel,
which is optimally adjusted using the EM algorithm,
as explained later in 2.2. And
()
t
i
dy
can be obtained
using any pixel-based background model. Since this
paper employs the codebook model (Kim, et al.,
2005),
(
)
t
i
dy
is defined as a minimum distance
between an input pixel
t
i
y and the centroids of the
codeword
k
c
belonging to the codebook
i
C
.
The node
i is arranged in a two-dimensional grid,
and so its scene variable
t
i
x
should be compatible
with the nearby scene variables
t
j
x
. Let
λ
be a
ICINCO 2011 - 8th International Conference on Informatics in Control, Automation and Robotics
282
probability that
t
i
y will come out from the
background model and
()
t
ii
E
x
be the energy term
corresponding to
(
)
t
ii
Vx
. Then
(
)
t
ii
E
x
can be reduced
to
()
()
1
t
i
dy
t
ii
Ex e
M
μ
λ
λ
=+
(6)
where
M
e
Γ
=
. Then (5) can be depicted as:
()
()
1
log
t
i
dy
t
ii
Vx e
M
μ
λ
λ
⎛⎞
=− +
⎜⎟
⎝⎠
(7)
The potential
()
,
,
tt
ij i j
Vxx
between
t
i
x
and
t
j
x
is
defined so that it has a larger value when the
variable
t
i
x
is different from
t
j
x
, as follows:
()
,
,
tt t t
ij i j i j
Vxx xx
ν
=−
(8)
Likewise,
()
1
,
,
tt
ij i j
Vxx
is defined as
()
11
,
,
tt t t
ij i j i j
Vxx xx
σ
−−
=−
(9)
where
ν
and
σ
are the proportional constants.
Then, (4) can be represented as:
()
()
()
1
,
1
log
t
i
dy
i
tt tt
ij ij
ij
Ue
M
xx xx
μ
λ
λς
νσ
⎧⎫
=− +
⎨⎬
⎩⎭
+−+
(10)
The above equation can be further simplified by
noting that a function of the form
(
)
log
bx
ae c
−+ is
tightly upper bounded by
()
min ,x
β
γα
+
, where
()
log ac
α
=− + ,
ab
ac
β
=
+
and
log
ac
c
γ
+
⎛⎞
=
⎜⎟
⎝⎠
. Thus,
minimizing (10) is equivalent to minimizing
()
()
()
()
1
,
min ,
t
i
i
tt tt
ij ij
ij
Udy
xx xx
κθ
νσ
=+
+−+
(11)
where
1
M
λμ
κ
λ
λ
=
+
and
log 1
1
M
λ
θ
λ
⎛⎞
=+
⎜⎟
⎝⎠
.
The belief propagation is adopted to solve the
above equation (Yedidia, et. Al., 2002). Let
ij
m
be
the message that node
i sends to a neighboring node
j
at time t . It is determined by the message update
rules:
(
)
(
)
(
)
()
()
()()
1
,/
min ,
min
tt t
ij j i
i
tt tt tt
ij ij kii
ij k Ni j
mx dy
x
xxx mx
κθ
νσ
=+
+−++
∑∑
(12)
And the belief
(
)
t
ii
bx
at a node i
is computed as
(
)
(
)
(
)
()
()
min ,
tt t
ii i kii
kNi
bx dy m x
κθ
=+
(13)
where
(
)
N
i
denotes the nodes neighboring
i . The
scene variable
t
i
x
is selected so that
(
)
t
ii
bx
should be
minimized, namely
(
)()
() ()
1b0 b1
0b0 b1
ii
t
i
ii
x
>
=
(14)
2.1 Estimating
ν
and
σ
The parameters
ν
and
σ
are initialized using the
detection results of the pixel-based background
subtraction method. The energy term associated with
the potential
(
)
,
,
tt
ij i j
Vxx
corresponds to the joint
probability, called the compatibility function, given
as:
()
,
,
tt
ij
x
x
tt
ij i j
Exx e
ν
−−
=
(15)
So the probabilities corresponding to
tt
ij
x
x= and
tt
ij
x
x
are computed from the histogram of the
detection results at time
t , where
j
is the neighbour
of
i .
The parameter
ν
can be estimated as
(
)
()
()
()
,
,
log
tt
ij
ij
tt
ij
ij
hx x
hx x
ν
=−
=
⎩⎭
(16)
where
(
)
h
is the histogram computed from the
segmented image
{
}
tt
i
X
x=
.
Likewise,
σ
can be obtained by
(
)
()
()
()
1
,
1
,
log
tt
ij
ij
tt
ij
ij
Px x
Px x
σ
=−
=
⎩⎭
(17)
using
t
X
and
1t
X
.
2.2 Estimating
μ
and
λ
The parameters
and
λ
are estimated using the
expectation maximization algorithm. Let
BACKGROUND SUBTRACTION USING BELIEF PROPAGATION
283
(
)
{
}
max 1
t
i
Ldy=+
be the number of possible
distance values of the pixels which come out from
the background model. A random variable
i
ξ
is
assigned to each pixel
t
i
y , indicating whether the
pixel comes out from the background model. In
other words,
i
ξ
has a value of
0
when
t
i
y belongs to
the background model, otherwise
i
ξ
equals
1
. Then
the conditional probability of
i
ξ
can be computed as
(
)
()
()
()
0,,
1
t
i
t
i
t
ii i
dy
dy
Pdy
e
e
M
μ
μ
ρ
ξλμ
λ
λ
λ
==
=
+
(18)
Using the method proposed by (Zhang, Seits, 2007),
the parameters
and
λ
are estimated by
maximizing the expected log-probability
(
)
(
)
log , ,
i
t
ii
EPdy
ξ
ξ
λμ
⎡⎤
⎣⎦
, where
()
(
)
,,
t
ii
Pd y
ξ
λμ
is
given as
()
(
)
(
)
()
()
,0,
1
,1,
t
i
dy
t
ii
t
ii
Pd y e
Pd y
M
μ
ξλμλ
λ
ξλμ
==
==
(19)
Using the above equations,
(
)
(
)
log , ,
i
t
ii
EPdy
ξ
ξ
λμ
⎡⎤
⎣⎦
can be expressed as follows.
(
)
()
()
()
()
()
()
log , ,
log , 0 ,
1 log , 1 ,
i
t
i
t
i
t
ii
t
iii
yb
t
iii
yf
EPdy
Pd y
Pd y
ξ
ξλμ
ρξλμ
ρ
ξλμ
⎡⎤
⎣⎦
==
+− =
(20)
This equation can be reduced to be
()
()
()
()
()
()
log , ,
1
= log 1 log
i
tt
ii
t
ii
t
iii
yB yF
EPdy
dy
M
ξ
ξλμ
λ
ρλμ ρ
∈∈
⎡⎤
⎣⎦
−+
∑∑
(21)
By setting the partial derivatives of the above
equation with respect to
λ
and
μ
to be zero,
λ
is
estimated as
()
1
t
i
tt
ii
i
yb
ii
yB yF
ρ
λ
ρ
ρ
∈∈
=
+−
∑∑
(22)
where
B
and
F
represent background and
foreground, respectively.
λ
actually can be
approximated as the ratio of the number of pixels
decided as background over the total number of
pixels. And
is the solution of the equation
(
)
1
11
t
i
t
i
t
ii
yb
L
i
yb
dy
L
ee
μμ
ρ
ρ
−=
−−
(23)
According to our experimentation results,
L
is over
30
, so that the second term of the left-hand side of
(23) is negligible. Thus, the above equation can be
solved explicitly as
1
log 1
μ
χ
⎛⎞
=+
⎜⎟
⎝⎠
(24)
where
χ
is the right-hand side of (23).
The proposed method alternates between
estimating the parameters with the intermediate
foreground detection results and detecting the
foreground with the estimated parameters, after
computing them with the detection results of the
codebook-based background subtraction.
3 EXPERIMENTAL RESULTS
The proposed method is tested with the real videos
recorded indoors and outdoors, whose ground truths
are manually segmented. Codebook algorithm (Kim,
et. al., 2005) is selected as a pixel-based background
model. Any postprocessing operations such as
morphologies or connected component labelling are
not used to demonstrate the effectiveness of the
proposed method.
In the first sequence, illumination variation
occurs according to the distance between the camera
and the foreground object. Fig. 1 depicts the
comparative detection results on the video recorded
indoors. The third and fourth columns show the
results of the codebook algorithm and the proposed
method, respectively. The input frames show the
lower limbs can rarely be identified from their
background regions due to their slight color
difference under the dark background, while the
upper body of the object is very discriminative from
the background. The proposed method detects the
lower limbs more clearly than the codebook
algorithm.
Fig. 2 shows the results on the video recorded
outdoors. As can be seen from the input frame on the
first column, the color of the lawn near the center
region is very similar to that of her jacket. The
codebook algorithm can not distinguish between
them clearly and yields the streaks of false negatives
ICINCO 2011 - 8th International Conference on Informatics in Control, Automation and Robotics
284
Figure 1: The comparative experimental results on the
video recorded indoors. Column 1: original images.
Column 2: ground-truths. Column 3: detection results of
codebook-based algorithm. Column 4: detection results of
our method.
near the middle of the detected foreground.
Basically the pixel-based algorithms can hardly
distinguish the foreground objects from the
background under the above situation. However,
MRF can solve it by communicating with the
adjacent pixels through the compatibility functions
mentioned above. However, the proposed method
misclassifies some background regions as
foreground, which are revealed as small blobs
scattered.
Fig. 3 shows the similarity test results to evaluate
the performance of the proposed method
quantitatively. The similarity (Chen, et. al., 2007) is
defined as follows,
()
(
)
()
,
tt
tt
tt
LG
SLG
LG
=
(25)
Figure 2: The comparative experimental results on the
video recorded outdoors. Column 1: original images.
Column 2: ground-truths. Column 3: detection results of
codebook-based algorithm. Column 4: detection results of
our method.
where
t
L
and
t
G
represent the detection result and
the corresponding ground truth, respectively. The
ground truths are manually segmented every
4
frame. The similarity value approaches 1 when the
overlapped region between
t
L
and
t
G
increases. The
proposed method shows the similarity value higher
than that of the codebook algorithm at almost every
frame. However at some frames of video recorded
outdoors, the segmentation performance degrades
slightly due to increase of false positives.
4 CONCLUSIONS
The algorithm that incorporates MRF model into the
pixel-based background model is proposed.
Basically almost all MRF-based background models
select the fixed values for all MRF parameters. The
proposed method shows the improved foreground
detection by estimating all the parameters adaptively,
instead of using the fixed parameters.
Extensive experiment conducted with videos
recorded indoors and outdoors demonstrates the
proposed MRF model effectively reduces the false
negatives in detecting the foreground objects under
complex background. However it is shown that the
proposed method misclassifies some background
regions as foreground slightly more, compared with
the pixel-based segmentation algorithms. More
efforts will be needed to reduce the number of such
misclassifications without an appreciable
degradation in classification speed.
(a)
(b)
Figure 3: The similarity curves for the codebook-based
algorithm and the proposed method on the video recorded
(a) indoors and (b) outdoors.
BACKGROUND SUBTRACTION USING BELIEF PROPAGATION
285
REFERENCES
Stauffer, C., Grimson, W. E. L., 1999. Adaptive
background mixture models for real-time tracking.
IEEE International Conference on Computer Vision
and Pattern Recognition, Vol. 2, pp. 246-252.
Elgammal, A., Duraiswami, R., Harwood, D., Davis, L. S.,
2002. Background and foreground modeling using
nonparametric kernel density estimation for visual
surveillance. Proc. IEEE, vol. 90, no. 7, pp. 1151-
1163.
Kim, K., Chalidabhongse, T. H., Harwood, D., 2005.
Real-time foreground-background segmentation using
codebook model. Elsevier Real-Time Imaging, vol. 11,
pp. 172-185.
Migdal, J., Grimson, W. E., 2005. Background subtraction
using Markov thresholds. Proceedings of the IEEE
Workshop on Motion and Video Computing
(WACV/MOTION'05).
Wu, M., Peng, X., 2010. Spatio-temporal context for
codebook-based dynamic background subtraction.
International Journal of Electronics and
Communications, pp. 739-747.
Chen, Y., Chen, C., Huang, C., Hung, Y., 2007. Efficient
hierarchical method for background subtraction.
Pattern Recognition, pp. 2706-2715.
Yedidia, J. S., Freeman W. T., Weiss, Y., 2002.
Understanding belief propagation and its generalize-
tions. TR-2001-22.
Zhang, L., Seitz, S. M., 2007. Estimating optimal
parameters for MRF stereo from a single image pair.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 29, no.2, pp. 331-342.
McHugh, J. M., Konrad, J., Saligrama, V., Jodoin, P.,
2009. Foreground-adaptive background subtraction.
IEEE Signal Processing Letters, vol.16, issue 5,
pp.390-393.
Xu, X., Huang, T. S., 2008. A loopy belief propagation
approach for robust background estimation. CVPR
2008, pp. 23-28.
ICINCO 2011 - 8th International Conference on Informatics in Control, Automation and Robotics
286