AUTOMATIC KERNEL WIDTH SELECTION FOR NEURAL

NETWORK BASED VIDEO OBJECT SEGMENTATION

Dubravko Culibrk

University of Novi Sad

Novi Sad, Serbia

Daniel Socek, Oge Marques, Borko Furht

Department of Computer Science and Engineering, Florida Atlantic University

Boca Raton FL 33431, USA

Keywords:

Video processing, Object segmentation, Background modeling, BNN, Neural Networks.

Abstract:

Background modelling Neural Networks (BNNs) represent an approach to motion based object segmentation

in video sequences. BNNs are probabilistic classiﬁers with nonparametric, kernel-based estimation of the

underlying probability density functions. The paper presents an enhancement of the methodology, introducing

automatic estimation and adaptation of the kernel width.

The proposed enhancement eliminates the need to determine kernel width empirically. The selection of a

kernel-width appropriate for the features used for segmentation is critical to achieving good segmentation re-

sults. The improvement makes the methodology easier to use and more adaptive, and facilitates the evaluation

of the approach.

1 INTRODUCTION

Object segmentation is a basic task in the domain of

digital video processing. Diverse applications, such

as scene understanding, object-based video encoding,

surveillance applications and 2D-to-pseudo-3D video

conversion, rely on the ability to extract objects from

video sequences.

The research into object segmentation for video

sequences grabbed from a stationary camera has

yielded a number of approaches based on the detec-

tion of the motion of objects. The approaches of

this class scrutinize the changes observed between

the consecutive frames of the sequence to detect pix-

els which correspond to moving objects. The task is

particularly difﬁcult when the segmentation is to be

done for natural scenes where the background con-

tains shadows, moving objects, and undergoes illumi-

nation changes.

For purposes of automated surveillance and scene

understanding it is often of more interest not only to

detect the objects moving in the scene, but to distin-

guish between two classes of objects:

• Background objects corresponding to all objects

that are present in the scene, during the whole se-

quence or longer than a predeﬁned period of time.

• Foreground objects representing all other objects

appearing in the scene.

The goal of the foreground segmentation is to sep-

arate pixels corresponding to foreground from those

corresponding to background.

Background Modeling Neural Network (BNN)

represents a probabilistic approach to foreground seg-

mentation(Culibrk et al., 2006). The network is a

Bayes rule based unsupervised classiﬁer designed to

enable the classiﬁcation of a single pixel as pertinent

to foreground or background. A set of networks is

used to classify all the pixels within the frame of the

sequence.

The networks incorporate kernel-based estimators

(Mood and Graybill, 1962) for probability density

functions used to model the background. The net-

works are general in terms of the features of a pixel

used to perform the classiﬁcation, such as RGB val-

ues or intensity values. However, the accuracy of the

process depends on the ability to determine the ap-

propriate width for the estimator kernels. The pro-

cess relies on empirical data and involves tedious ex-

perimentation. In addition the BNNs are unable to

adapt to conditions occurring in speciﬁc sequences.

Rather, a single value is typically used whenever the

same features are used as basis for segmentation. Fur-

472

Culibrk D., Socek D., Marques O. and Furht B. (2007).

AUTOMATIC KERNEL WIDTH SELECTION FOR NEURAL NETWORK BASED VIDEO OBJECT SEGMENTATION.

In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IU/MTSV, pages 472-479

 SciTePress

thermore, the complexity of the process of evaluation

of a segmentation algorithm depends directly on the

number of parameters that need to be speciﬁed (Wirth

et al., 2006). Thus, reducing the number of param-

eters not only makes the methodology easier to use

and more adaptive, it facilitates the evaluation of the

approach.

In this paper, an extension of the BNN methodol-

ogy is proposed to incorporate automatic selection of

the appropriate kernel width. The proposed enhanced

BNNs do not suffer from the above mentioned short-

comings of the ﬁxed-kernel-width BNNs and achieve

comparable segmentation performance.

The rest of the paper is organized as follows: Sec-

tion 2 provides a survey of related published work.

Section 3 contains a short description of BNNs. The

proposed enhancement of the BNNs is described in

Section 4. Section 5 is dedicated to the presentation

of experimental evaluation results. Section 6 contains

the conclusions.

2 RELATED WORK

Early foreground segmentation methods dealing with

non-stationary background are based on a model of

background created by applying some kind of low-

pass ﬁlter on the background frames. The high-

frequency changes in intensity or color of a pixel are

ﬁltered out using different ﬁltering techniques such

as Kalman ﬁlters (Karmann and von Brandt, 1990) to

create an approximation of background in the form of

an image (reference frame). The reference frame is

updated with each new frame in the input sequence,

and used to segment the foreground objects by sub-

tracting the reference frame from the observed frame

(Rosin, 1998). These methods are based on the most

restrictive assumption that observe pixel changes due

to the background are much slower than those due

to the objects to be segmented. Therefore they are

not particularly effective for sequences with high-

frequency background changes, such as natural scene

and outdoor sequences.

Probabilistic techniques achieve superior results

in case of such complex-background sequences.

These methods rely on an explicit probabilistic model

of the background, and a decision framework allow-

ing for foreground segmentation. A Gaussian-based

statistical model whose parameters are recursively up-

dated in order to follow gradual background changes

within the video sequence is proposed in(Boult et al.,

1999). More recently, Gaussian-based modelling was

signiﬁcantly improved by employing a Mixture of

Gaussians (MoG) as a model for the probability den-

sity functions (PDFs) related to the distribution of

pixel values. Multiple Gaussian distributions, usu-

ally 3-5, are used to approximate the PDFs (Ellis and

Xu, 2001)(Stauffer and Grimson, 2000). The param-

eters of each Gaussian curve are updated with each

observed pixel value. If an observed pixel value is

within the 2.5 standard deviations (σ) from the mean

(µ) of a Gaussian, the pixel value matches the Gaus-

sian (Stauffer and Grimson, 2000). The parameters

are updated only for Gaussians matching the observed

pixel value, based on the following Equations:

= (1−ρ)∗µ

t−1

+ ρ ∗X

(1)

= (1−ρ) ∗σ

t−1

+ ρ ∗(X

−µ

)

∗(X

−µ

) (2)

where

ρ = ℵ(X

, µ

t−1

, σ

t−1

) (3)

and ℵ is a Gaussian function and X

is a pixel value

observed at time t. Equations 1 - 3 express a causal

low-pass ﬁlter applied to the mean and variance of the

Gaussian.

Using a small number of Gaussians leads to

a rough approaximation of the PDFs involved.

Due to this fact, MoG achieves weaker results

for video sequences containing non-periodical back-

ground changes (e.g. due to waves and water sur-

face illumination, cloud shadows, and similar phe-

nomena), as was reported in (Li et al., 2004). The

Gaussian-based models are parametric in the sense

that they incorporate underlying assumptions about

the probability density functions (PDFs) they are try-

ing to estimate.

In 2003, Li et al. proposed a method for fore-

ground object detection employing a Bayes decision

framework (Li et al., 2004). The method has shown

promising experimental object segmentation results

even for the sequences containing complex variations

and non-periodical movements in the background.

The primary model of the background used by Li

et al. is a background image obtained through low

pass ﬁltering. However, the authors use a probabilis-

tic model for the pixel values detected as foreground

through frame-differencing between the current frame

and the reference background image. The probabilis-

tic model is used to enhance the results of primary

foreground detection. The probabilistic model is non-

parametric since it does not impose any speciﬁc shape

to the PDFs learned. However, for reasons of efﬁ-

ciency and improving results the authors applied bin-

ning of the features and assigned single probability to

each bin, leading to a discrete representation of PDFs.

The representation is equivalent to a kernel-based es-

timate with quadratic kernel. The width of the kernel

used was determined empirically and remained ﬁxed

in all the reported experiments(Li et al., 2004). The

system achieved performance better than that of the

mixture of 5 Gaussians in the results presented in the

same publication. However, when larger number of

Gaussians is used, MoG achieved better performance

(Culibrk, 2006). A nonparametric kernel density es-

timation framework for foreground segmentation and

object tracking for visual surveillance has been pro-

posed in (ElGammal et al., 2002). The authors present

good qualitative results of the proposed system, but do

not evaluate segmentation quantitatively nor do they

compare their system with other methods. The frame-

work is computationally intensive as the number of

kernels corresponds to the number of observed pixel

values. The width of the kernels is adaptive and they

use the observed median of absolute differences be-

tween consecutive pixel values. The rationale for the

use of median is the fact that its estimate is robust

to small number of outliers. They assume Gaussian

(normal) distribution for the differences and establish

a relation between the estimated median and the width

of the kernel:

σ =

0.68

√

(4)

where m is the estimated median.

The approach based on background modelling

neural networks was proposed in (Culibrk et al.,

2006). The networks employ represent a biologically

plausible implementation of Bayesian classiﬁers and

nonparametric kernel based density estimators. The

weights of the network store a model of background,

which is continuously updated. The PDF estimates

consist of ﬁxed number of kernels, which have ﬁxed

width. The appropriate width of the kernels is deter-

mined empirically. The kernel width depends on the

features used to achieve segmentation. Results supe-

rior to those of Li et al. and MoG with 30 Gaussians

are reported in (Culibrk, 2006).The BNNs address

the problem of computational complexity of the ker-

nel based background models by exploiting the paral-

lelism of neural networks.

3 BACKGROUND MODELING

NEURAL NETWORK (BNN)

Background Modeling Neural Network (BNN) is a

neural network classiﬁer designed speciﬁcally for

foreground segmentation in video sequences. The

network is an unsupervised learner. It collects statis-

tics related to the dynamic processes of pixel-feature-

value changes. The learnt statistics are used to clas-

sify a pixel as pertinent to a foreground or background

object in each frame of the sequence.

Note that a video sequence can be viewed as

a set of pixel feature values changing in time, so-

called pixel processes (Stauffer and Grimson, 2000).

Pixel feature values are, in general, vectors in multi-

dimensional space, such as RGB space. Probabilis-

tic motion-based foreground segmentation methods,

including the BNN approach, rely on a supposition

derived from the deﬁnitions of foreground and back-

ground stated in Section 1:

Pixel (feature) values corresponding to back-

ground objects will occur most of the time, i.e. more

often than those pertinent to the foreground.

Thus, if a classiﬁer is able to effectively distin-

guish between the values occurring more frequently

than others it should be able to achieve accurate seg-

mentation. The BNN classiﬁer strives to estimate the

probability of a pixel value X occurring at the same

time as the event of a background or foreground ob-

ject being located at that particular pixel. In terms of

probability theory, the BNN tries to estimate the joint

probability p(b, X) of background b and pixel value

X occurring at the pixel the BNN is trying to classify

and the analogous joint probability p( f, X) for fore-

ground.

By virtue of the Bayes rule, the classiﬁcation cri-

terion used in BNN is the following:

p(b|X)p(X) − p( f|X)p(X) > 0 (5)

where p( f|X) and p(b|X) represent estimated con-

ditional PDFs of an observed pixel value X indicat-

ing a foreground and background object, respectively.

p(X) is the estimated PDF of a feature value occur-

ring.

In the structure of BNN, shown in Figure 1, three

distinct subnets can be identiﬁed. The classiﬁcation

subnet is a central part of BNN concerned with ap-

proximating the PDF of pixel feature values belong-

ing to background/foreground. It is a neural network

implementation of a Parzen (kernel based) estimator

(Parzen, 1962). The estimation is discussed in more

detail in Section 4.

The classiﬁcation subnet contains three layers of

neurons. Input neurons of this network simply map

the inputs of the network, which are the values of the

feature vector for a speciﬁc pixel, to the pattern neu-

rons. The output of the pattern neurons is a nonlin-

ear function of Euclidean distance between the input

of the network and the stored pattern for that speciﬁc

neuron:

= exp[−

−x

)

−x

)

2σ

] (6)

where w

is the vector of weights between the input

neurons and the i-th pattern neuron, x

is the pixel

feature value vector and p

is the output of the i-th

Figure 1: Structure of Background Modeling Neural Network.

pattern neuron. The only parameter of this subnet is

the kernel width (σ), sometimes dubbed the smooth-

ing parameter, which is used to control the shape of

the nonlinear function.

The output of the summation units of the classi-

ﬁcation subnet is the sum of their inputs. The sub-

net has two summation neurons: one to calculate the

probability of the observed pixel value corresponding

to background and the other to calculate the probabil-

ity of the value pertaining to foreground, correspond-

ing to products in criterion 5.

Weights between the pattern and summation neu-

rons are used to store the conﬁdence with which a

pattern belongs to the background/foreground. The

weights of these connections are updated with each

new pixel value received (i.e. with each frame), ac-

cording to the following recursive equations:

t+1

= f

((1−

) ∗w

+ MA

β) (7)

t+1

= 1−w

t+1

(8)

where w

is the value of the weight between the i-th

pattern neuron and the background summation neu-

ron at time t, w

is the value of the weight between

the i-th pattern neuron and the foreground summation

neuron at timet, β is the learning rate, P is the number

of the pattern neurons of BNN, f

is a clipping func-

tion deﬁned by (9) and MA

indicates the neuron with

the maximum response (activation potential) at frame

t, according to (10).

(x) =



1, x > 1

x, x ≤1

(9)



1, for neuron with maximum response;

0, otherwise.

(10)

Equations 7 - 10 express the notion that whenever

an instance pertinent to a pattern neuron is encoun-

tered, the probability that that pattern neuron is acti-

vated by a feature value vector belonging to the back-

ground is increased. Naturally, if that is the case, the

probability that the pattern neuron is excited by a pat-

tern belonging to foreground is decreased. Vice versa,

the more seldom a feature vector value corresponding

to a pattern neuron is encountered the more likely it

is that the patterns represented by it belong to fore-

ground objects. By adjusting the learning rates, it is

possible to control the speed of the learning process.

The output of the classiﬁcation subnet indicates

whether the output of the background summation neu-

ron is higher than that of the foreground summation

neuron, i.e. that it is more probable that the input fea-

ture value is due to a background object rather than a

foreground object based on criterion 5. If the criterion

5 is satisﬁed than the pixel is classiﬁed as background,

otherwise it is classiﬁed as foreground.

The activation and replacement subnets are

Winner-Take-All (WTA) neural networks. The acti-

vation subnet performs a dual function: it determines

which of the neurons of the network has the maxi-

mum activation (output) and whether that value ex-

ceeds a threshold (θ) provided as a parameter to the

algorithm. If it does not, the BNN is considered in-

active and replacement of a pattern neuron’s weights

with the values of the current input vector is required.

If this is the case, the feature is considered to belong

to a foreground object.

The ﬁrst layer of this network has the structure of

a 1LF-MAXNET (Kwan, 1992) network and a single

neuron is used to indicate whether the network is ac-

tive. The output of the neurons of the ﬁrst layer of the

network can be expressed in the form of the following

equation:

= X

∏

i=1

{F(X

−X

|i 6= j)} (11)

where:

F(z) =



1, if z ≥ 0;

0, if z < 0;

(12)

The output of the ﬁrst layer of the activation subnet

will differ from 0 only for the neurons with maxi-

mum activation and will be equal to the maximum ac-

tivation. In Figure 1 these outputs are indicated with

, , Z

. Figure 2 shows the inner structure of a neu-

ron in the ﬁrst layer of the subnet. A single neuron in

Figure 2: Structure of processing neurons of the activation

subnet.

the second layer of the activation subnet is concerned

with detecting whether the BNN is active or not and

its function can be expressed in the form of the fol-

lowing equations:

NA = F(

∑

i=1

−θ) (13)

where F is given by Equation 12 and θ is the acti-

vation threshold. Finally, the replacement subnet in

Figure 1 can be viewed as a separate neural net with

the unit input. Each of the ﬁrst-layer neurons in the

replacement subnet is connected with the input via

synapses that have the same weight as the two out-

put synapses between the pattern and summation neu-

rons of the classiﬁcation subnet. Each pattern neuron

has a corresponding neuron in the replacement net.

The function of the replacement net is to determine

the pattern neuron that minimizes the criterion for re-

placement, expressed by the following equation:

replacement

criterion = w

+ |w

−w

| (14)

The criterion is a mathematical expression of the idea

that those patterns that are least likely to belong to the

background and those that provide least conﬁdence

to make the decision should be eliminated from the

model.

The neurons of the ﬁrst layer calculate the negated

value of the replacement criterion for the pattern neu-

ron they correspond to. The second layer is a 1LF-

MAXNET that yields non-zero output corresponding

to the pattern neuron to be replaced.

To form a complete background-subtraction solu-

tion a single instance of a BNN is used to model the

features at each pixel of the image.

4 AUTOMATIC KERNEL WIDTH

ESTIMATION

BNNs employ a Parzen-estimator-based (Parzen,

1962) representation of the PDFs needed to achieve

classiﬁcation. This class of estimators is nowadays

also known as kernel-based density estimators and

was used in the approach presented in (ElGammal

et al., 2002), as discussed in Section 2. A Parzen esti-

mator of a PDF based on a set of measurements used

within BNNs has the following analytical form:

p(X) =

(2π)

∑

t=0

exp[−

(X −X

)

(X −X

)

2σ

]

(15)

where P is the dimension of the feature vector, T

the number of patterns used to estimate the PDF (ob-

served pixel values), X

are the pixel values observed

up to the frame T

, σ is the kernel width.

The Parzen estimator deﬁned by (15) is a sum of

multivariate Gaussian distributions centered at the ob-

served pixel values. As the number of observed values

approaches inﬁnity, the Parzen estimator converges

to its underlying parent density, provided that it is

smooth and continuous. To reduce the memory and

computational requirements of the segmentation, the

BNNs employ a relatively small number of kernels

(up to 30 kernels showed good results in our exper-

iments), but the kernels are used to represent more

than one observation and assigned weights in a man-

ner similar to that discussed in (Specht, 1991). In

addition, due to the format of the classiﬁcation crite-

rion 5 the normalizing expression preceding the sum

in Equation 15 can be omitted for BNN purposes.

The smoothing parameter (σ) controls the width

of the Gaussians. Fig. 3 shows the plot of a Parzen

estimator for three stored points with values in a sin-

gle dimension (e.g. if only the intensity value for a

pixel is considered).

The horizontal lane in Fig. 3 represent the thresh-

old values used to decide which feature values are

Figure 3: Plots of Parzen estimators for different values of

”smoothing parameter”.

covered by a single Gaussian. The threshold was set

to 0.5 in the plot. All features within the circle deﬁned

by the cross-section of a single Parzen kernel and the

threshold plane are deemed close enough to the cen-

ter of the peak to be within the cluster pertinent to

the Gaussian. The selection of smoothing parameter

value and the threshold controls the size of the cluster.

Larger values of σ lead to less pronounced peaks in

the estimation, i.e. make the estimation ”smoother”.

The value of smoothing parameter has profound

impact on the quality of segmentation and requires

tedious experimentation to determine for a particular

pixel feature used for segmentation. To alleviate this

deﬁciency of the BNN approach, an automatic pro-

cedure is proposed for learning and adaptation of the

smoothing parameter based on the properties of the

segmented sequence.

The number of kernels in a BNN is ﬁxed, and

determined by the available computational resources.

The smoothing parameter should be selected so that

the predetermined number of kernels is able to cover

all the pixel values occurring due to background. In

addition, a more accurate approximation of PDFs can,

in general, be achieved by the kernel of smaller width.

Thus, the goal of smoothing parameter estimation is

to determine the minimum width of kernels needed to

account for all the background pixel values, based on

a predeﬁned number of kernels and the BNN activa-

tion threshold. To achieve this goal, the kernel width

is updated with each new pixel value observed.

Let σ

and µ

be estimates of the standard devia-

tion and mean of the background pixel values in the

observed part of the video sequence, along the i-th di-

mension of the pixel feature vector. In order not to

increase the computational and memory requirements

of the BNN, it is desirable to estimate σ

and µ

based

on the information already contained in the BNN.

For an estimate of the mean µ

we use the average

value of the i-th coordinate of patterns stored in the

network, which correspond to the weights between

the i-th input neuron and each pattern neuron of the

classiﬁcation subnet of BNN :

∑

j=1

(16)

To estimate the standard deviation σ

, the average

of the distance of each center form µ

is calculated,

weighted with the weights of the connections between

each pattern neuron and summation neuron corre-

sponding to the background. This way the contribu-

tion of patterns likely to correspond to background is

exacerbated, while the inﬂuence of the patterns due to

foreground is diminished. The formula for σ

is given

by Equation 17.

∑

j=1

∗(w

j−µ

)

P−1

(17)

Since the width of the BNN kernels is the same

along each dimension of the feature vector, maximum

standard deviation over all the dimensions is used to

estimate the smoothing parameter:

σ = 0.5∗

−2∗max

i∈1..N

dim

logθ

(18)

where θ corresponds to the BNN activation threshold.

Equation 18 corresponds to the kernel that will be ac-

tive for all patterns within two estimated maximum

standard deviations. The factor of two is introduced,

since the estimator based on 17 tends to underestimate

the real deviation. Equation 17 would give a precise

estimate based on stored patterns, if all of them corre-

sponded to background and the conﬁdence of of them

pertaining to background was one. It is unlikely that

all the stored patterns in the BNN will correspond to

background. In addition, the conﬁdence that these

patterns corresponds to the background will usually

be less than one.

5 EXPERIMENTS AND RESULTS

To evaluate the approach, a PC-based foreground seg-

mentation application based on the BNN methodol-

ogy and employing adaptive kernels, has been devel-

oped. BNNs containing 30 processing neurons in the

classiﬁcation subnet were used. Pixel intensity was

used as a feature to guide the segmentation. The ac-

tivation threshold (θ) of the BNNs was set to 0.5, but

the methodology showed itself to be robust to a wide

range of threshold values. The learning rates used

were 0.01, 0.005 and 0.003 depending on the dynam-

ics of the sequence.

(a) (b) (c)

Figure 4: Results obtained for campus sequence: (a) frame from the original sequence, (b) segmentation results obtained for

the frame shown, (c) ground truth frame.

(a) (b) (c)

Figure 5: Results obtained for room sequence: (a) frame from the original sequence, (b) segmentation results obtained for the

frame shown, (c) ground truth frame.

A set of diverse sequences containing complex

background conditions, provided by Li et al. (Li

et al., 2004) and publicly available at

http://

perception.i2r.a-star.edu.sg

, was used. The

results of the segmentation were evaluated both quali-

tatively and quantitatively, using a set of ground truth

frames provided by the same authors for the differ-

ent sequences. The ten testing sequences were ob-

tained in several different environments. They can

be classiﬁed based on the sources of complexity in

background variation, pertinent to each environment.

Three classes of sequences (environments) can be

identiﬁed: outdoor environments, small indoor envi-

ronments and large (public) indoor environments.

The sources of complexity in the sequences ob-

tained in outdoor environments are usually due to ob-

jects moved by wind (e.g. trees or waves) and illu-

mination changes due to changes in cloud cover. For

small indoor environments, such as ofﬁces, the source

of complexity related mostly to objects such as cur-

tains or fans moving in the background or screens

ﬂickering. The illumination changes are mostly due

to switching lights on and off. Large public indoor en-

vironments (e.g. subway stations, airport halls, shop-

ping centers etc.) are characterized by lighting dis-

tributed from the ceiling and presence of specular sur-

faces, inducing complex shadow and glare effects. In

addition, these spaces can contain large moving ob-

jects such as escalators and elevators.

For reasons of space, frames illustrating qualita-

tive results for a single sequence representative of

each class are presented in this paper. More complete

results as well as Matlab scripts that can be used to

evaluate the approach in a manner similar to that de-

scribed in (Culibrk, 2006) and (Li et al., 2004) can be

found at

http://mlab.fau.edu

The frames shown in Figure 4 are pertinent to

a video of a campus driveway. The complexity of

the background in this sequence is due to the trees

in the background moving violently in the wind and

due to the changing illumination. The Figure shows

the frame of the original sequence, segmentation re-

sult enhanced through morphological processing and

human-generated ground truth. Figure 5 shows a seg-

mentation result achieved for a small indoor environ-

ment sequence. The background is complex due to

the moving curtain. Figure 6 shows a segmentation

result achieved for a large indoor environment of a

shopping mall. The changing glare of the ﬂoor makes

the background complex in this case.

(a) (b) (c)

Figure 6: Results obtained for shopping mall sequence: (a) a frame of the original sequence, (b) segmentation result and (c)

ground truth frame.

The results obtained for all three sequences and

indeed all other testing sequences are good, although

the intensity only based segmentation is prone to

shadow segmentation effects as noted in (ElGammal

et al., 2002).

6 CONCLUSION

Object segmentation is a fundamental task in several

important domains of video processing. We proposed

an enhancement of the Background modelling Neural

Network approach to foreground segmentation. Auto-

matic selection of the kernel width for the estimators

used within the methodology, has been introduced.

The proposed kernel width estimation principle en-

ables the BNN to adapt to the conditions of a speciﬁc

video, makes the methodology easier to use and al-

lows for easier evaluation of the BNN approach.

The methodology has been tested using a publicly

available and diverse set of sequences and achieves

good segmentation results. Further testing of the

methodology and application of the approach to prob-

lems of segmentation results enhancement and object

tracking represent some of the possible directions of

future research.

REFERENCES

Boult, T., Micheals, R., X.Gao, Lewis, P., Power, C., Yin,

W., and Erkan, A. (1999). Frame-rate omnidirec-

tional surveillance and tracking of camouﬂaged and

occluded targets. In Proc. of IEEE Workshop on Vi-

sual Surveillance, pp. 48-55.

Culibrk, D. (2006). Neural Network Approach to Bayesian

Background Modeling for Video Object Segmentation.

Ph.D. Dissertation, Florida Atlantic University, USA.

Culibrk, D., Marques, O., Socek, D., Kalva, H., and Furht,

B. (2006). A neural network approach to bayesian

background modeling for video object segmentation.

In Proc. of the International Conference on Computer

Vision Theory and Applications (VISAPP’06).

ElGammal, A., Duraiswami, R., Harwood, D., and Davis,

L. (2002). Background and foreground modeling us-

ing nonparametric kernel density estimation for visual

surveillance. In Proc. of the IEEE, vol. 90, No. 7, pp.

1151-1163.

Ellis, T. and Xu, M. (2001). Object detection and tracking

in an open and dynamic world. In Proc. of the Second

IEEE International Workshop on Performance Evalu-

ation on Tracking and Surveillance (PETS’01).

Karmann, K. P. and von Brandt, A. (1990). Moving ob-

ject recognition using an adaptive background mem-

ory. In Timevarying Image Processing and Moving

Object Recognition, 2, pp. 297-307. Elsevier Publish-

ers B.V.

Kwan, H. K. (1992). One-layer feedforward neural network

fast maximum/minimum determination. In Electron-

ics Letters, pp. 1583-1585.

Li, L., Huang, W., Gu, I., and Tian, Q. (2004). Statistical

modeling of complex backgrounds for foreground ob-

ject detection. In IEEE Trans. Image Processing, vol.

13, pp. 1459-1472.

Mood, A. M. and Graybill, F. A. (1962). Introduction to the

Theory of Statistics. Macmillan.

Parzen, E. (1962). On estimation of a probability density

function and mode. In Ann. Math. Stat., Vol. 33, pp.

1065-1076.

Rosin, L. (1998). Thresholding for change detection. In

Proc. of the Sixth International Conference on Com-

puter Vision (ICCV’98).

Specht, D. F. (1991). A general regression neural network.

In IEEE Trans. Neural Networks, pp. 568-576.

Stauffer, C. and Grimson, W. (2000). Learning patterns of

activity using real-time tracking. In IEEE Trans. Pat-

tern Analysis and Machine Intelligence, vol. 22, pp.

747-757.

Wirth, M., Fraschini, M., Masek, M., and Bruynooghe, M.

(2006). Performance evaluation in image processing.

In EURASIP Journal on Applied Signal Processing,

Vol. 2006, pp. 13.