Knowledge-Based Silhouette Detection

Antonio Fernández-Caballero

Escuela Politécnica Superior de Albacete, Departamento de Informática

Universidad de Castilla-La Mancha, 02071 – Albacete, Spain

Abstract. A general-purpose neural model that challenges image understanding

is presented in this paper. The model incorporates accumulative computation,

lateral interaction and double time scale, and can be considered as biologically

plausible. The model uses - at global time scale t and in form of accumulative

computation - all the necessary mechanisms to detect movement from the grey

level change at each pixel of the image. The information on the detected motion

is useful as part of an object’s shape can be obtained. On a second time scale

base T<<t, and by means of lateral interaction of each element with its

neighbours, other parts of the moving object are also considered, even when no

variation in grey level is detected on these parts. After introducing the general

concepts of the model denominated Lateral Interaction in Accumulative Com-

putation, the model is applied to the problem of silhouette detection of all mov-

ing elements in an indefinite sequence of images. The model is lastly compared

to the most important current knowledge on motion analysis showing, this way

its suitability to most well-known problems in silhouette detection.

1 Introduction

The visual system is able to quickly calculate the movement of the objects in an envi-

ronment from the light intensity variations that reach the eye. One of the solutions the

visual system has taken for the calculation of the movement is the spatial extraction of

local motion signals and their spatial combination in higher processing layers to calcu-

late more complex movement types. Current available knowledge suggests the exis-

tence of several stages in motion analysis in the visual system [1]. In the first level the

measures of local movement extract those motion components perpendicular to the

edges of the elements present in the image. The second level combines the measures

of local movement of portions of the image with the purpose of calculating a smaller

number of local estimates of translation of the object. Finally, a third level integrates

the local estimates of translation movement to calculate more complex non-local

movements (for example, global rotations).

The problem stated is the discrimination of a set of non-rigid objects capable of

holding our attention in a scene [2],[3]. These objects are detected from the motion of

any of their parts. Detected in an indefinite sequence of images, motion allows obtain-

ing the silhouettes of the moving elements. The method introduced is compared to the

state-of-the art in current knowledge for silhouette detection.

Fernández-Caballero A. (2005).

Knowledge-Based Silhouette Detection.

In Proceedings of the 5th International Workshop on Pattern Recognition in Information Systems, pages 114-123

DOI: 10.5220/0002570001140123

Copyright

c

SciTePress

2 General-Purpose LIAC Model

A generic model based on neural architecture, with recurrent and parallel computation

at each specialised layer, and sequential computation between consecutive layers, is

presented. The model is based on an accumulative computation function, followed by

a set of co-operating lateral interaction processes performed on a functional receptive

field organised as centre-periphery over linear expansions of their input spaces [4].

Double time scale is a third important cue. All these terms will be introduced by offer-

ing a view of all models incorporating these items.

2.1 Recurrent Lateral Interaction Model

In lateral interaction models [4], you have a layer of modules of the same type with

local connectivity, such that the response of a given element does not only depend on

its own inputs, but also on the inputs and outputs of the element’s neighbours. From a

computational point of view, the aim of the lateral interaction nets is to partition the

input space into three regions: centre (C), periphery (P) and excluded (E). The follow-

ing steps must be followed: (a) processing on the central region, (b) processing on the

feedback data from the periphery zone, (c) comparison of the results of these opera-

tions and a local decision generation, and, (d) distribution over the output space.

There are two general expressions to formulate lateral interaction: non-recurrent inter-

action and recurrent interaction.

Let I(

α

,

β

) be the input signal to an element situated on co-ordinates (

α

,

β

), Φ(x,y)

be the output signal of an element on position (x,y), and K(x,y;

α

,

β

) be the weight

factor that translates the effect of element (

α

,

β

) into element (x,y). The output of

element (x,y) for the common case of recurrent interaction is:

)1(),(),;,(),(),(

∫∫

Φ+=Φ

R

ddyxKyxIyx

βαβαβα

2.2 Accumulative Computation Model

Next the accumulative computation model [5] is introduced. This model basically

responds to a sequential module represented by its state value. The accumulative

computation process works on an input parameter and responds with an output called

the module’s discharge value. The state value is also called the permanence value and

is generally stored in a permanence memory. The output value of element (x,y),

Φ(x,y,t), is a function of the charge value P(x,y,t), as shown in equation (2):

)2(

),,,(

),,(,0

),,(

⎩

⎨

⎧

<

=Φ

otherwisetyxP

tyxPif

tyx

θ

The module calculates at each time instant t the charge value P(x,y,t) as a function

of the proper permanence value at t-

∆

t, that is to say P(x,y,t-

∆

t), and as a function of

the input signal I(x,y,t). The permanence function is called F

p

. Note that in this accu-

mulative computation model the previous result has to fall between two established

values v

dis

and v

sat

corresponding to the states of discharged and saturated of element

115

(x,y) at instant t.

[

]

[][]

[]

⎪

⎩

⎪

⎨

⎧

≥

<<∆−

≥

= )3(

...,

...,),,(,),,(

...,

),,(

pdisdis

satpdisp

satpsat

Fvifv

vFviftyxIttyxPF

vFifv

tyxP

The structure of the input and output spaces is that of FIFO memories to include

time as a calculus variable. In a layer output space there is a heap of responses of

these layer units. So, it works as a local and transitory memory that saves the outputs

of all the neurones of the layer during some time interval. This FIFO output memory

specifies the co-operating capacity of the net (its local connectivity), as the different

units in recurrent neurones layers take a look on their neighbour’s responses in that

moment as well as in some previous moments.

2.3 Double Time Scale Model

The model also incorporates the notion of double time scale at accumulative computa-

tion level present at sub-cellular micro-computation [5]. The following properties are

applicable to the model: (a) a local convergent process around each element, (b) a

semiautonomous functioning, with each element capable of spatio-temporal accumula-

tion of local inputs in time scale T, and conditional discharge, and, (c) an attenuated

transmission of these accumulations of persistent coincidences towards the periphery

that integrates at the global time scale t. Therefore there are two different time scales:

(a) the local time T = n ∆T, and, (b) the global time t = k ∆t (T<<t).

2.4 Lateral Interaction in Accumulative Computation Model

Lastly, the model proposed incorporates all the general notions seen so far, grouping

common terms in biology such as accumulative computation, lateral interaction and

double time scale. Note that all of them have been largely studied over time. The con-

tribution consists in restricting the model to the following characteristics:

1.

Application of accumulative computation to a single central element starting

from the state value of the element itself and the input coming from the previous

layer.

2.

Application of lateral interaction mechanisms (lateral inhibition) from close pe-

riphery formed by four elements with co-ordinates (x-1,y), (x+1,y), (x,y-1) and

(x,y+1) on the centre (x,y). (a) The interaction is of recurrent type. (b) All

neighbours have the same weight in lateral interaction. (c) The total effect on an

element subject to lateral interaction is linear, that is to say the different particu-

lar effects on an element are summed up. (d) Concerning the dimensionality of

the model, it may be stated that lateral interaction takes place in spatio-temporal

co-ordinates.

3.

Global time scale t is used for (a) reading the input from the previous layer, (b)

data processing by accumulative computation, and (c) writing the output to the

following layer.

4.

Local time scale T<<t is used in all lateral co-operative interaction mechanisms.

116

These series of specific characteristics permit to rewrite equations (1) and (3) for

the particular model in the following way:

1. At global time scale t:

[

]

[][]

[]

⎪

⎩

⎪

⎨

⎧

≥

<<∆−

≥

=

...,

...,),,(,),,(

...,

),,(

gdisdis

satgdisg

satgsat

Fvifv

vFviftyxITtyxCF

vFifv

tyxC

2. At local time scale T:

[]

[]

[]

()()()()

RbaRyx

Fvifv

vFvifTCTTyxCF

vFifv

TyxC

ldisdis

satldis

x

x

y

y

l

satlsat

∈∈≠∀

⎪

⎪

⎩

⎪

⎪

⎨

⎧

≥

<<

⎥

⎥

⎦

⎤

⎢

⎢

⎣

⎡

∆−

≥

=

∑∑

+

−=

+

−=

,,,,,,

...,

...,),,(,),,(

...,

),,(

1

1

1

1

βαβα

βα

αβ

Now, a description of the model structure as well as the generic tasks applicable to

the model may be provided. Each module’s charge value C(x,y,t) is a function of the

history of the input signals coming from at most two preceding levels and the charge

values of that particular module’s four direct neighbours.

3 Knowledge-Based Silhouette Detection

Next an extensive explanation of the approach introduced when confronting the mo-

tion knowledge domain from the perspective of the lateral interaction in accumulative

computation model is presented, as well as the reasons that have motivated these deci-

sions. In greater or lesser extent, a significant number of problems appear in all mod-

els related to motion analysis when looking for the silhouettes of the objects present in

a scene. This is due to the very nature of motion’s projection. The most relevant

drawbacks in this context are detailed. (1) The aperture problem, (2) the limits of

movement, (3) the occlusions, (4) the false movement due to image background, (5)

the motion speed, (6) the colour in motion, and, (7) the discrimination of more than

one moving object.

3.1 The Aperture Problem

A consequence of the no equivalence between the projected movement and the optic flow is

that motion information is intrinsic to the image structure. In other words, it is dependent on

the variation of grey levels in the image. The aperture problem appears when there is not

enough variation in grey level in a studied region of the image to uniquely resolve the prob-

lem. More than one motion candidate would be equally valid in the observed data of the

image. Specifically, this means that the velocity component can only be estimated with

some degree of certainty in a direction perpendicular to a significant image gradient.

The aperture problem is dealt by assigning the true velocity of the whole model to those

moving image elements with possible ambiguities. Thus it can be solved if at least two

movement measurements of local components exist to be able to estimate the velocity of a

117

pattern in one point. In a simple translation movement in a plane, the problem can be re-

solved. This is not the case, however, for general 3D or rotational movement, where the real

2D velocity varies from point to point. It is easy to understand that the problem is even

increased when dealing with non-rigid objects.

The application of lateral interaction in accumulative computation is perfectly able

to confront the aperture problem by softening the true 2D velocity in each point of the

silhouette to obtain an object. In fact, this occurs through the necessary lateral interac-

tion exchange mechanisms among the points that are next in the image. Lateral inter-

action does not try to find the speed of each image pixel. It rather obtains a unique

velocity for each moving object present in the scene. The aperture problem is solved

using an adequate charge calculus function in local time scale T.

3.2 The Limits of Movement

The inherent ambiguity of the 3D in 2D projection causes an additional complication.

Where a discontinuity exists in the depth of a scene, for example, in case the objects in

movement are superimposed, some points of the three-dimensional space with differ-

ent movements may be projected on the same 2D point of the image plane. This

causes a discontinuity in the spatial motion field. Since the region really contains a

series of different movements, a model that assumes the very common supposition that

every region bears one single movement will fail when modelling these inner regional

discontinuities and will estimate a unique movement for this region. This unique

movement will correspond to the dominant or the mean movement. Moreover, if the

discontinuities in the movement were solely temporary, this would greatly affect the

algorithms of obtaining the silhouettes and their subsequent tracking.

In this case, a very relevant problem appears when trying to segment an image in all

its moving objects. In this case it is also imperative to consider the outreach of the

lateral interaction mechanisms. In other words, it is fundamental to perfectly delimit

the receptive field R in the formula of lateral interaction in the local time scale T.

Notice that the perfect demarcation of the receptive field is a fundamental learning

task of lateral interaction in accumulative computation model. Lastly let us emphasise

that the receptive field denotes the outreach of lateral interaction, and, therefore,

movement limits.

3.3 The Occlusions

An additional consequence of the projection ambiguity is that an object in movement

will expose a previously covered area and vice versa. This effect is known as occlusion.

In these regions correspondence does not exist among different frames and motion field is

not defined. Similar situations occur when there is a change in a given scene or when new

objects enter or leave the scene. Thus, motion analysis raises its complexity, since motion

has to be obtained by other means in these areas.

It is necessary to emphasise that the occlusion problem is not resolved by the lateral

interaction in accumulative computation model. The model is pixel oriented (point,

118

element) and not region oriented (area, object); hence, it inherits all problems related

to occlusions in pixel-oriented segmentation.

3.4 The False Movement Due to the Image Background

It is important to pay attention to the problem of the false movement due to the image

background. Indeed, whenever motion is detected due to a moving object, there is always

a detection of motion due to the image background. The movement of an object consti-

tutes an invasion of an area previously belonging to the image background, while the

false image the area determines background motion that the object has just left. Fig. 1

shows an example of this false motion detection.

In this example you may appreciate the motion of a bus through two images of a se-

quence, as well as the difference between two images of the sequence. Labelled under

(A) we have the movement detected due to the object, while under (B) we may appreci-

ate the false movement due to the image background.

Fig. 1. Object and background motion detection

When facing background motion, the lateral interaction mechanisms have to con-

sider the influence of the environment of the motion detection areas, so that back-

ground motion may be diluted. A simple way to affront this task is adjusting the

threshold values, in order not to allow accumulative computation or lateral interaction.

This way it is possible to perfectly delimit valid charges in each pixel from valid or

non-valid silhouette information.

3.5 The Motion Speed

When an object moves along an image, there are two possible cases (see Fig. 2). (1) The

element moves at enough speed, or the sampling rate is slow enough, so that no pixel of

the object at this instant occupies any pixel of the object in the previous instant. (2) This

intersection is not null.

119

Image at t-∆t Image at t Image diff. Image at t-∆t Image at t Image diff.

(a) (b)

Fig. 2. Image difference (a) without and (b) with intersection

The case where the intersection is empty does not represent any problem, since the en-

tire silhouette of the element is displayed (both in the previous instant t-

∆

t and in the

current instant t). This is not the case when this intersection is not empty.

With the purpose of considering this problem, the possibility to work with interme-

diate charge values (v

dis

< C(x,y,t) < v

sat

) has been incorporated. These intermediate

charge values belong to those parts of moving objects that have not varied in their

grey levels, but that we know they are part of the object.

By using an appropriate global computing function F

g

, we can force the pattern (or

part of the pattern) to start shaping while the moving element changes its position.

Therefore, the solution to the problem of motion speed lays in incorporating an activa-

tion / deactivation mechanism that allows to centre the attention on where the element

is moving, ignoring where the element comes from. And this is carried out by means

of the election of a convenient global computing function F

g

.

A necessary mechanism of lateral co-operation able to satisfy the following prem-

ises is implemented: (a) Pixels maximally charged (C(x,y,t) = v

sat

) help to increase the

charge of intermediate charged pixels (v

dis

< C(x,y,t) < v

sat

) that are directly or indi-

rectly connected to them. (b) Pixels charged to intermediate values (v

dis

< C(x,y,t) <

v

sat

) with no connection to any maximally charged pixels will tend to discharge, and

therefore to total deactivation.

3.6 The Colour in Motion

Let us present in this section a simple and classic method of motion detection that allows

knowing which image pixels of a sequence have experienced a variation in colour o grey

level (being this latter the most usual). The algorithm used in numerous cases is:

⎩

⎨

⎧

∆−≠

=

otherwise

ttyxGLtyxGLif

tyxMOV

,0

),,(),,,(,1

),,(

There is motion detection MOV(x,y,t) in an image pixel (x,y) at time instant t, if

grey level GL has varied among instants t-

∆

t and t. The method described is robust

when the elements that move on the image sequence are composed of one single grey

level. A binarisation results into a trivial way to differentiate among points that may or

not have changed their grey level. The verified fact is that the changes in one grey

120

level interfere with the changes in another one for the calculation of the element sil-

houettes.

It is easy to foresee that the motion detection method starting from the grey level

change is valid when the object has a single grey level. So we have opted to segment

each image in a number n (typically 8 or 16) of grey levels and to use the previous

algorithms on each grey level band, as if we had n images formed by one single grey

level moving objects. Notice that using all possible grey levels is equivalent to using

256 grey levels, what does not subtract generality to the exposed argument. Thus,

there will be a charge value at each grey level band for each image pixel. In the same

way, there a part of the silhouette at each grey level band and pixel will exist. At the

end, shades obtained at each grey level band are gathered to obtain the complete sil-

houette of the moving object.

3.7 The Discrimination of More than One Moving Object

Another important problem related to motion in real image sequences is the high complex-

ity that implies the calculation of all moving elements (objects) in the scene. In addition to

this already serious problem is the discrimination or classification of the objects obtained.

This is another non-trivial problem that is currently solved by silhouette searchers.

It is also proposed to use another lateral interaction mechanism to clearly differen-

tiate among the silhouettes of all the objects in movement in the scene. It may be sup-

posed that an object is a closed group of pixels with a charge value above a minimum

required. Calculating the mean value of the charge of all the pixels that form the sil-

houette, a value is obtained which represents the charge in each instant t. This com-

mon charge value is the identity sign of the moving object. This value is evidently

responsible for the spatio-temporal accumulative function F

l

in local time scale T.

4 Data and Results

In order to show the functionality of the knowledge-based silhouette detection model

described, next a significant example is offered. Here the image sequence TwoWalk-

New downloaded from University of Maryland Institute for Advanced Computer Stud-

ies, copyright © 1998 University of Maryland, College Park, is used. This sequence

was originally created to test the real time visual surveillance system W

4

[6]. It shows

two people (a man and a woman) walking through a scene. Fig. 10 shows the result of

applying the knowledge-based model proposed to some images of the sequence. The

parameter values for this experiment were

∆

t=0.35 seconds,

∆

T=0.005 seconds, n=8,

v

dis

=0, v

sat

=255. Note that the model perfectly detects the silhouettes of the non-rigid

objects in a quite simple manner.

Fig. 3 (a) offers the two silhouettes at a time instant close to the beginning of the

sequence. The young man is walking from left to right, while the young woman is

walking from left to right. Note that the man’s silhouette may be best appreciated.

This is due to the greater contrast of the man’s cloth with the background.

121

(a) (b) (c)

(d) (e)

Fig. 3. Some result images

There is also some noise present in the resulting images, especially in the upper

part of the images. This is naturally due to the motion of the trees observed from the

two last computed images.

Fig. 3 (b) presents the result before both people interact in the 2D projection of

their 3D shapes. On Fig. 3 (c) you may see what happens during the occlusion phase.

Both people appear as a unique silhouette. Remember, once again, that this is due to

lateral interaction.

The situation on Fig. 3 (d) has now changed. The young people are no more on the

same 2D projection. So there are two silhouettes again. This will be the situation

through the rest of the sequence.

7 Conclusions

In this paper a model for lateral interaction in accumulative computation and its appli-

cation to motion detection after introducing knowledge on motion detection has been

proposed. The model incorporates accumulative computation, lateral interaction and

double time scale, and can be considered as biologically plausible.

The model may be compared to background subtraction or frame difference algo-

rithms in the way motion is detected. Then, a region growing technique is performed

to define moving objects. In contrast to similar approaches no complex image pre-

processing must be performed and no reference image must be offered to the model.

A background subtraction technique [7] could have been chosen as the general out-

line. But notice the uselessness of this technique in real environments of outdoor

scenes. Methods based on the correlation of characteristics [8] have not been taken in

account because they are excessively dependent on the type of application faced. Gra-

dient-based methods [9], in turn, suffer from excessive problems to be able to pursue

the needs of this work. The biggest inconvenience of these methods it is that they are

122

excessively directed to the obtaining of one single pattern in the scene, and that they

offer very poor results in the presence of diverse objects. Also, the computational cost

is enormous. Neither the methods based on regions [10] fulfil the general requirements

looked for in this work. They are methods excessively oriented to the search in very

concrete regions of the images. It may also be highlighted that the proposed model has

no limitation in the number of non-rigid objects silhouettes to differentiate. This

knowledge-based model facilitates object classification by taking advantage of the

object charge value, common to all pixels of a same moving element. Thanks to this

fact, any higher-level operation will decrease in difficulty. The model seems to be

promising in a lot of different applications related to image processing. The model is

currently being tested in very different real world applications.

Acknowledgements

This work is supported in part by the Spanish CICYT TIN2004-07661-C02-02 grant.

References

1. E. Trucco, A. Verri, Introductory Techniques for 3-D Computer Vision, Prentice Hall,

1998.

2. A. Fernández-Caballero, Jose Mira, Ana E. Delgado, M.A. Fernández, “Lateral interaction

in accumulative computation: A model for motion detection”, Neurocomputing, 50C, 341-

364, 2003.

3. A. Fernández-Caballero, J. Mira, M.A. Fernández, A.E. Delgado, “On motion detection

through a multi-layer neural network architecture”, Neural Networks, 16(2), 205-222,

2003.

4. J. Mira, A.E. Delgado, A. Manjarrés, S. Ros, J.R. Alvarez, “Cooperative processes at the

symbolic level in cerebral dynamics: Reliability and fault tolerance”, in R. Moreno-Diaz

and J. Mira, eds., Brain Processes, Theories and Models, The MIT Press, 244-255, 1996.

5. M.A. Fernandez, J. Mira, M.T. Lopez, J.R. Alvarez, A. Manjarrés, S. Barro, “Local accu-

mulation of persistent activity at synaptic level: Application to motion analysis”, Springer-

Verlag, LNCS, 930, 137-143, 1995.

6. I. Haritaoglu, D. Harwood, L.S. Davis, “W4: Who? When? Where? What? A real time

system for detecting and tracking people”, Proceedings of the Second International Confer-

ence on Automatic Face and Gesture Recognition, Nara, Japan, 222-227, 1998.

7. S. Niyogi, E. Adelson, “Analyzing gait with spatiotemporal surfaces”, IEEE Workshop on

Motion of Non-rigid and Articulated Objects, 64-69, 1994.

8. F.G. Meyer, P. Bouthemy, “Region-based tracking using affine motion models in long

image sequences”, Computer Vision, Graphics and Image Processing: Image Understand-

ing, 60(2), 119-140, 1994.

9. J.L. Barron, D.J. Fleet, S.S. Beauchemin, “Performance of optical flow techniques”, Inter-

national Journal of Computer Vision, 12(1), 43-77, 1994.

10. M.J. Black, P. Anandan, “The robust estimation of multiple motions: Parametric and

piecewise-smooth flow fields”, Computer Vision and Image Understanding, 63(1), 75-104,

1996.

123