ROBUST ESTIMATION OF THE PAN-ZOOM PARAMETERS FROM

A BACKGROUND AREA IN CASE OF A CRISS-CROSSING

FOREGROUND OBJECT

J. Bruijns

Philips Research, High Tech Campus 36, 5656 AE, Eindhoven, The Netherlands

Keywords:

3DTV, 3D display, 3D processing, 2D-to-3D conversion, background motion estimation, foreground-

background segmentation.

Abstract:

In the ﬁeld of video processing, a model of the background motion has application in deriving depth from

motion. The pan-zoom parameters of our background model are estimated from the motion vectors of parts

which are a priori likely to belong to the background, such as the top and side borders (”the background area”).

This fails when a foreground object obscures the greater part of this background area. We have developed a

method to extract a set of pan-zoom parameters for each different part of the background area. Using the

pan-zoom parameters of the previous frame, we compute from these sets the pan-zoom parameters most likely

to correspond to the proper background parts. This background area partition method gives more accurate pan

parameters for shots with the greater part of the background area obscured by one or more foreground objects

than application of the entire background area.

1 INTRODUCTION

For the introduction of 3DTV on the market (Fehn

et al., 2002; de Beeck et al., 2002), availability of 2D-

to-3D conversion is an important ingredient. As only

a very limited amount of 3D recorded content (stereo-

scopic or other) is available, 3DTV is only attractive

for a wide audience if existing material can be shown

in 3D as well. Within Philips Research, the technol-

ogy for fully automatic, real-time 2D-to-3D conver-

sion at the consumer side has been developed over the

past years (Barenbrug, 2006; Redert et al., 2007).

The 3D format used consists of the original 2D

video, augmented with a depth channel (the term

depth as used in this paper is strictly speaking a recip-

rocal depth or disparity). This depth channel allows

to render views from positions slightly displaced from

the original view point. These additional views can

then be interleaved and sent to a multi-view screen

such as the Philips 3DLCD. (Fehn, 2004; Berretty

et al., 2006).

The depth maps are generated using several depth

cues. One of the depth cues used is ”depth from mo-

tion” (Ernst et al., 2002). The depth-from-motion

method is for a static scene equivalent to the structure-

from-motion (SFM) methods. The camera calibration

part of the SFM methods is replaced by estimation of

a background motion: A motion model for objects at

large distance from the camera. All objects with (pos-

sibly independent)motions that do not conform to this

background model are supposed to be in front of the

background.

The background model is a pan-zoom model

(de Haan and Biezen, 1998):

m

x

= p

x

+ s

x

∗ ˆx

m

y

= p

y

+ s

y

∗ ˆy

(1)

with m = (m

x

,m

y

)

T

the background motion,

{ ˆx, ˆy} the pixel coordinates with regard to the opti-

cal center, {p

x

, p

y

} the pan parameters and{s

x

,s

y

} the

zoom parameters. Note that although theoretically,

s

x

≡ s

y

, we allow here two different values for the mo-

ment. This has as big advantage that the expressions

for the x and y motions decouple: there is no interac-

tion between them. We come back to this issue at the

end of Section 3.

The pan-zoom parameters should be estimated

from the motion vectors of the background blocks.

The motion vectors (see Figure 1 and 2; the blue

crosses are explained later on) are estimated using

8x8 pixel blocks (de Haan and Biezen, 1994). But

327

Bruijns J. (2008).

ROBUST ESTIMATION OF THE PAN-ZOOM PARAMETERS FROM A BACKGROUND AREA IN CASE OF A CRISS-CROSSING FOREGROUND

OBJECT.

In Proceedings of the Third International Conference on Computer Vision Theory and Applications, pages 327-334

DOI: 10.5220/0001072103270334

Copyright

c

SciTePress

Figure 1: k Motion k ﬁeld of video shot OFFICE for a

proper background area.

Figure 2: k Motion k ﬁeld of video shot OFFICE for an

obscured background area.

because it is not known which blocks belong to the

background, blocks which are a priori likely to be-

long to the background are selected for estimation of

the pan-zoom parameters. The selected background

blocks (”the background area”) are blocks close to

the top of the image and blocks close to the left and

right image borders. Examples of background areas

are the set of blue crosses in Figure 3 and 4.

However, if a relatively large foreground object is

criss-crossing, the greater part of the background area

may be obscured (see Figure 4 and 2), resulting in er-

roneous pan-zoom parameters. We have developed a

method to extract a set of pan-zoom parameters for

each different part of the background area (i.e. the

proper background part and the obscured background

parts). We use the pan-zoom parameters of the previ-

Figure 3: Frame of video shot OFFICE with a proper back-

ground area.

Figure 4: Frame of video shot OFFICE with an obscured

background area.

ous frame to compute from these sets the pan-zoom

parameters most likely to correspond to the proper

background part.

Our method for robust estimation of the pan-zoom

parameters in case of a proper background area (i.e.

the majority of the blocks of the background area

are proper background blocks) using the entire back-

ground area (the ”EBA method”) is described in Sec-

tion 3. Our background area partition method (the

”BAP method”) for computation of the pan-zoom pa-

rameters in case of an obscured background area (i.e.

the majority of the blocks of the background area are

obscured by one or more foreground objects) is de-

scribed in Section 4. In Section 5 we present our re-

sults and in Section 6 we give some conclusions to

consider.

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

328

2 RELATED WORK

The structure-from-motion (SFM) methods are the

only commonly known and successful methods for

2D-to-3D conversion. When a camera moves around

a static scene, SFM can provide very good conversion

results (Huang and Netravali, 1994; Azarbayejani and

Pentland, 1995; Zhang et al., 1995; Armstrong et al.,

1996; Beardsley et al., 1996; Xu and Zhang, 1996;

Falkenhagen, 1997; Pollefeys et al., 2000; Hartley

and Zisserman, 2001; Ernst et al., 2002). However,

several issues hamper wide use of SFM. First, if cam-

era parameters such as pixel aspect ratio and zoom

are unknown, they have to be estimated along with

the conversion. In this case many successive frames

with different viewpoints are required to deﬁne a sin-

gle solution (Armstrong, 1996; Pollefeys et al., 1998;

Pollefeys, 1999). Secondly, if camera movements are

very small or even absent, the mathematics become

singular and no solution can be obtained. Finally,

most scenes are not static, but contain independently

moving objects. Several methods exist to handle this,

e.g. (Xu and Zhang, 1996), but these methods again

require that camera or object motion is considerable

and that each individual object is rigid. Among the

most important video objects are humans, which are

far from rigid. So, in most cases accurate SFM is not

possible.

3 THE EBA METHOD

If only a very small portion of the blocks of the back-

ground area are obscured by foreground objects (see

Figure 3 and 1), the pan-zoom parameters can be es-

timated accurately provided that the outliers are re-

moved. Using Equation 1 local pan-zoom parame-

ters can be extracted from two or more blocks of the

background area. A set of local pan-zoom parameters

can be collected by selecting different subsets of the

background area. After the outliers are removed, the

global pan-zoom parameters can be computed by tak-

ing e.g. the average or the median of the remaining

set of local pan-zoom parameters.

To avoid the time-consuming computation for all

possible subsets of the background area (Ω(n

2

)), we

have implemented a two-stage method using single

blocks (O(n)). First, the zoom parameter is com-

puted. Secondly, using this zoom parameter the pan

parameters are computed. This method is based on

the relation between the zoom parameter and the par-

tial derivatives of the motion ﬁeld:

∂m

x

∂x

=

∂(p

x

+ s

x

∗ ˆx)

∂x

= s

x

∂m

y

∂y

=

∂(p

y

+ s

y

∗ ˆy)

∂y

= s

y

(2)

Because there is only one zoom parameter (i.e.

s

x

≡ s

y

), we select only those blocks of the back-

ground area for which the two local zoom parameters

s

l,x

and s

l,y

(i.e. the two local partial derivatives ob-

tained from smoothed differences) are almost equal:

|s

l,x

− s

l,y

| ≤ t (3)

with the threshold t equal to the 25% percentile of

the absolute differences of the local zoom parameters

of the background area {|s

l,x

− s

l,y

|}.

The precise value of the threshold t is not so im-

portant as long as the blocks with a relatively large

difference between the two local zoom parameters

(i.e. the blocks with probably outliers) are removed.

After that we compute the global zoom parameter

s

g,x

(i.e. the s

x

of Equation 1) from the local zoom

parameters {s

l,x

} of the selected blocks using robust

statistical procedures (Marazzi, 1987) as follows:

1. Compute the robust standard deviation s

1

:

m

0

= median({s

l,x

})

m

1

= median(|{s

l,x

} − m

0

|)

s

1

= m

1

/Φ

−1

(0.75)

(4)

where Φ

−1

(0.75) is the value of the inverse stan-

dard Normal distribution at the point 0.75. As

Marazzi explains, the factor Φ

−1

(0.75) trans-

forms the absolute deviation m

1

to the standard

deviation s

1

.

2. Remove the outliers from {s

l,x

}.

We classify a local zoom parameter s

l,x

as an out-

lier if

|s

l,x

− m

0

| > s

1

∗ Φ

−1

(0.99) (5)

Assuming a Gaussian distribution, this means that

the probability that an inlier is classiﬁed as an out-

lier is less than 1.0%.

3. Compute the global zoom parameter s

g,x

from the

remaining local zoom parameters {s

l,x

}:

s

g,x

= median({s

l,x

}) (6)

After that the global zoom parameter s

g,y

(i.e. the

s

y

of Equation 1) is computed in the same way from

ROBUST ESTIMATION OF THE PAN-ZOOM PARAMETERS FROM A BACKGROUND AREA IN CASE OF A

CRISS-CROSSING FOREGROUND OBJECT

329

the selected local zoom parameters {s

l,y

}, we com-

pute the global zoom parameter s

g

from the two global

zoom parameters s

g,x

and s

g,y

:

s

g

= median(0,s

g,x

,s

g,y

). (7)

Application of this formula for the global zoom

parameter s

g

is based on the following reasoning:

1. In case s

g,x

= s

g,y

, the global zoom parameter s

g

should be equal to this value (i.e. s

g

= s

g,x

= s

g,y

).

2. Noise in the zoom estimate should be diminished.

In case both zoom estimates have the same sign,

the one closest to zero is taken; in case the zoom

estimates have a different sign, the global zoom

parameter is zero.

After the global zoom parameter s

g

is estimated,

the pan parameters p

x

and p

y

can be computed as fol-

lows:

1. Compute the local pan parameters {p

l,x

} and

{p

l,y

} for the selected blocks using Equation 1:

p

l,x

= m

x

− s

g

∗ ˆx

p

l,y

= m

y

− s

g

∗ ˆy

(8)

2. Compute the global pan parameters p

g,x

and p

g,y

from the local pan parameters {p

l,x

} respectively

{p

l,y

} using the same robust statistical procedure

as used for the computation of the global zoom

parameter s

g,x

.

4 THE BAP METHOD

If the majority of the blocks of the background area

are obscured by foreground objects (see Figure 4 and

2), the EBA method (see Section 3) gives wrong re-

sults for the pan parameters p

x

and p

y

(the possible

effect on and a possibly better procedure for the zoom

parameters will be discussed in Section 5 and Section

6). The large peaks of the red curves in Figure 8 and

9 are examples of distorted pan parameters. The con-

dition that the local zoom parameters s

l,x

and s

l,y

are

almost equal (see Equation 3), appears to hold also for

many obscured background (i.e. foreground) blocks.

The histograms of the local pan parameters ex-

hibit several clusters in case of an obscured back-

ground area (see Figure 5 for an example). One or

more (in case of a histogram with many small bins) of

these clusters represents the local pan parameters of

the proper background blocks. The other clusters rep-

resent the local pan parameters of the obscured back-

ground blocks.

−5 0 5 10 15 20

0

20

40

60

80

100

120

Figure 5: Histogram of {p

l,x

} of an obscured background

area.

We use the previous global pan parameter

p

g

(t − 1) (the indices

x

and

y

are left out from now

on to indicate either case) to select the clusters cor-

responding to the proper background blocks, and to

compute the new global pan parameter p

g

(t) from

these clusters by the following procedure:

1. Estimate the probability density function

{pd f(p

k

)} and the cumulative density function

{cfd(p

k

)} for and from the sorted local pan

parameters {p

l

(t)} using Gaussian kernels

(Silverman, 1986):

pd f(p

k

) =

1

N

N

∑

i=1

G(p

k

; p

i

,s)

cfd(p

k

) =

1

N

N

∑

i=1

Z

p

k

−∞

G(p; p

i

,s)dp

(9)

with p

k

∈ {p

l

(t)}, p

i

≤ p

i+1

, N the number of lo-

cal pan parameters and G(p;µ,σ) the Gaussian

probability density function. We use the resolu-

tion of the motion vectors (1/8 pixel for our cases)

for the standard deviation s.

2. To eliminate noise valleys, local outliers in

{pd f(p

k

)} are replaced by the average of their

two neighbors (a kind of median ﬁlter). pd f(i)

(short for pd f(p

i

)) is a outlier on a rising ﬂank

pd f(i− 2) ≤ pd f(i− 1)

pd f(i− 1) ≤ pd f(i+ 1)

pd f(i+ 1) ≤ pd f(i+ 2)

pd f(i− 2) < pd f(i+ 2)

(10)

if the following conditions are fulﬁlled:

pd f(i− 1) > pd f(i) ∨ pd f(i) > pd f(i+ 1)

|pd f(i) − pd f(i− 2)| ≤ t

outlier

|pd f(i) − pd f(i+ 2)| ≤ t

outlier

(11)

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

330

with the outlier threshold t

outlier

derived from the

absolute differences of the successive pd f values

{|pd f(i) − pd f(i− 1)|} as follows:

m = mean({|pd f(i) − pd f(i− 1)|})

s = std({|pd f (i) − pd f(i− 1)|} )

u = max({|pd f(i) − pd f(i− 1)|})

t

m,s

= m + s∗ Φ

−1

(0.99)

t

m,u

= (m+ u)/2)

t

outlier

= min(t

m,s

,t

m,u

)

(12)

Local outliers on a descending ﬂank are detected

and replaced in a similar way.

3. Subdivide the reﬁned {pd f(p

k

)} into clusters as

follows:

(a) Find the borders {b

l

,l ∈ [2,M − 1]} between

two clusters. The borders are placed at the val-

leys. A local pan parameter p

i

is a border if the

following conditions are fulﬁlled:

pd f(i) < pd f (i+ 1)

pd f(i) = {pd f(i− k),k ∈ [0, N],N ≥ 0}

pd f(i) < pd f (i− N − 1)

(13)

(b) Extend the set of borders by adding the smallest

local pan parameter p

1

and the greatest local

pan parameter p

N

to the sorted borders:

{p

1

,{b

l

,l ∈ [2,M − 1]}, p

N

} (14)

(c) Use these borders to create the cluster domains

{c

l

,l ∈ [1,M − 1]}:

c

l

= [b

l

,b

l+1

] (15)

(d) Remove ”insigniﬁcant” cluster domains. A

cluster domain c

k

is removed if both its pdf

peak and its pdf area are relatively small:

max(pd f(c

k

)) < f ∗ max({pd f (c

l

)})

cd f(c

k

) < f ∗ max({cd f(c

l

)})

(16)

with

cd f(c

i

) = cd f(b

i+1

) − cd f(b

i

) (17)

and

f =

N(2.0)

N(0.0)

≈ 0.1353 (18)

with N(x) the standard Normal probability den-

sity function.

Although this heuristic rule gives good results

for our video sequences, we are currently in-

vestigating whether this heuristic rule can be

replaced by a statistically based rule.

4. Compute for the ﬁnal set of cluster do-

mains {c

l

,l ∈ [1,L]} the pan modus parameters

{pc

l

,l ∈ [1,L]}:

pc

l

= arg max(pd f(c

l

)) (19)

5. Use the previous global pan parameter p

g

(t − 1)

and the pan modus parameters {pc

l

,l ∈ [1,L]}) to

compute the new global pan parameter p

g

(t) as

follows:

p

g

(t − 1) ≤ pc

1

→ p

g

(t) = pc

1

(20)

pc

L

≤ p

g

(t − 1) → p

g

(t) = pc

L

(21)

pc

l

≤ p

g

(t − 1) < pc

l+1

→

p

g

(t) =

α

l

pc

l

+ α

l+1

pc

l+1

α

l

+ α

l+1

(22)

with

α

l

=

(pc

l+1

− p

g

(t − 1))

8

(pc

l+1

− pc

l

)

8

α

l+1

=

(pc

g

(t − 1) − pc

l

)

8

(pc

l+1

− pc

l

)

8

(23)

Because of the empirically determined exponent

8 for the coefﬁcients α

l

and α

l+1

of the inter-

mediate case (see Equation 23), the new global

pan parameter p

g

(t) (see Equation 22) is approx-

imately equal to the pan modus parameter closest

to the previous global pan parameter p

g

(t − 1) ex-

cept when this previous global pan parameter is

located close to the center of the two pan modus

parameters (see Figure 6). Indeed, if the previ-

ous global pan parameter is far away from both

pan modus parameters, it is better to postpone the

choice for either of the two pan modus parame-

ters.

If there is one cluster left (i.e. L = 1), either Equa-

tion 20 or Equation 21 holds. In this case, inde-

pendent of the value of the previous global pan

parameter p

g

(t − 1), the new global pan parame-

ter p

g

(t) is equal to the pan modus parameter pc

1

of this cluster.

ROBUST ESTIMATION OF THE PAN-ZOOM PARAMETERS FROM A BACKGROUND AREA IN CASE OF A

CRISS-CROSSING FOREGROUND OBJECT

331

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

Figure 6: α

l

(blue) and α

l+1

(green) as function of the rel-

ative position of p

g

(t − 1).

Remarks:

1. We have ﬁrst estimated the probability density

function {pd f (p

k

)} and the cumulative density

function {cf d(p

k

)} from the local pan parameters

{p

l

(t)} using the Gaussian mixture model with

a variable number of Gaussian pdf’s (Frederix,

2005). However, in some cases this method un-

derestimated the number of clusters.

2. If the ﬁrst pair of frames of a shot contains an

obscured background area, real-time online selec-

tion of the proper pan modus parameters is not

possible because there are not yet reliable pan-

zoom parameters. But, if after a series of frames

a single cluster for the pan modus parameters

emerges, ofﬂine processing gives the possibility

to select the proper pan-zoom parameters of the

skipped frames by reversing the analysis of the

saved pan modus parameters from that frame back

to the beginning of the shot. Ofﬂine processing

gives even the possibility to select the proper pan

modus parameters when the whole shot contains

only multiple clusters by computation of the ”op-

timal” path through the pan modus parameters of

all frames of this shot.

5 RESULTS

We have applied both the EBA method (see Section

3) and the BAP method (see Section 4) to four video

shots, namely 50 frames (710x574) with a recording

of people passing a market stall with fruit (video shot

”FRUIT”), an arena shot with 326 frames (720x442)

of the movie ”The Gladiator” (video shot ”ARENA”),

an ofﬁce shot with 266 frames (706x424) of the car-

toon ”Incredibles”(video shot ”OFFICE”) and a canal

chase shot with 52 frames (960x544) of the movie

”The Italian Job” (video shot ”CANAL”).

0 50 100 150 200 250 300

−0.02

−0.01

0

0.01

0.02

Figure 7: The global zoom parameter s

g,x

(blue), s

g,y

(green) and s

g

(red) for video shot OFFICE.

The global zoom parameters s

g,x

(blue curves), s

g,y

(green curves) and s

g

(red curves) for video shot OF-

FICE are shown in Figure 7. The negative peaks in the

neighborhood of frame 10 of video shot OFFICE are

caused by a shot cut. The three curves follow roughly

the same path. The use of the median for the global

zoom parameter s

g

(see Equation 7) results clearly in

less noise.

The ﬁrst column of Table 1 contains the maximum

of the absolute differences between the global zoom

parameters max(|s

g,x

(t) − s

g,y

(t)|) at ˆx = 1. The sec-

ond column contains the same quantities at max( ˆx)

(i.e. at the side borders of the frame). The large differ-

ences for the video shots OFFICE and CANAL raise

the question whether a better method should be ap-

plied for the computation of the global zoom param-

eters from the local zoom parameters. We come back

to this issue in Section 6.

Table 1: max(|s

g,x

(t) − s

g,y

(t)|) in pixels.

shot at ˆx = 1 at max(ˆx)

FRUIT 0.00107 0.374

ARENA 0.00242 0.862

OFFICE 0.00981 3.414

CANAL 0.00657 3.154

For video shot FRUIT the maximum difference

between the pan parameters obtained with both meth-

ods was less than 1/8 pixel (the resolution of the mo-

tion vectors for our cases).

The distorted EBA (the red curves) and the im-

proved BAP (the green curves) pan parameters for

video shot OFFICE are shown in Figure 8 and 9.

There is an obscured background area roughly from

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

332

0 50 100 150 200 250 300

−4

−2

0

2

4

6

8

10

Figure 8: EBA (red) and BAP (green) pan parameter p

x

for

video shot OFFICE.

0 50 100 150 200 250 300

−8

−6

−4

−2

0

2

Figure 9: EBA (red) and BAP (green) pan parameter p

y

for

video shot OFFICE.

frame 50 until frame 150. The small peaks in the

neighborhood of frame 10 are caused by a shot cut.

Video shot CANAL gave similar results (more accu-

rate pan parameters for the BAP method)

The ﬁnal number of clusters, alternating between

n

x

and n

y

, for video shot OFFICE are shown in Figure

10. There are clearly more clusters for the frameswith

an obscured background area.

The averaged elapsed time in milliseconds on a

Linux PC (2.8GHz Pentium 4) to estimate the pan-

zoom parameters for a frame with the EBA method

followed by the BAP method (without the time

needed for motion estimation) are given in the ﬁrst

column of Table 2. The second column contains the

number of frames per second. Probably, most of the

computation time is spent in estimating the probabil-

ity density function (see Equation 9).

0 100 200 300 400 500 600

1

1.5

2

2.5

3

3.5

4

Figure 10: The ﬁnal number of clusters for video shot OF-

FICE.

Table 2: The performance of the EBA method followed by

the BAP method.

shot milli-sec. #frames / sec.

FRUIT 49.6 20.2

ARENA 48.0 20.8

OFFICE 34.1 29.3

CANAL 67.8 14.7

6 CONCLUSIONS

1. The EBA method gives good results for shots with

a proper background area even if a small num-

ber of blocks is obscured by a foreground object

because outliers due to these blocks are excluded

from the ﬁnal computation of the global zoom pa-

rameter.

2. The BAP method gives roughly the same results

for shots with a proper background area and much

better results for shots with an obscured back-

ground area than the EBA method.

3. We have inspected several histograms of the lo-

cal zoom parameters. The number of clusters for

these histograms was 1. But to increase the ro-

bustness of the BAP method, the same procedure

(i.e. cluster analysis and selection on the basis of

the previous parameters) should be applied also to

the computation of the global zoom parameters.

4. The current implementation of the BAP method

can be applied already for ofﬂine conversion, for

example in a media processor to convert a stored

2D video to a 2D+depth video to be stored on the

hard disk. After the ofﬂine conversion is ﬁnished

the stored 2D+depth video can be real-time ren-

dered on a 3DTV.

ROBUST ESTIMATION OF THE PAN-ZOOM PARAMETERS FROM A BACKGROUND AREA IN CASE OF A

CRISS-CROSSING FOREGROUND OBJECT

333

5. Because of the required amount of computation

time per frame, the current implementation of the

BAP method is not yet suitable for real-time 2D-

to-3D conversion. But the performance can and

will be improved (e.g. removal of code for in-

spection of the proper working of the algorithms).

REFERENCES

Armstrong, M. (1996). Self-Calibration from Image Se-

quences. PhD thesis, Department of Engineering Sci-

ence, University of Oxford, Oxford, UK.

Armstrong, M., Zisserman, A., and Hartley, R. (1996). Self-

calibration from image triplets. In Proc. ECCV, vol-

ume I, pages 3–16, Cambridge, UK.

Azarbayejani, A. and Pentland, A. (1995). Recursive esti-

mation of motion, structure, and focal length. IEEE

Trans. Pattern Anal. Machine Intell., 17(6):562–575.

Barenbrug, B. (2006). 3D throughout the video chain. In

Proc. IS&T ICIS’06, pages 366–369, Rochester, NY,

USA.

Beardsley, P., Torr, P., and Zisserman, A. (1996). 3D model

acquisition from extended image sequences. In Proc.

ECCV, volume II, pages 683–695, Cambridge, UK.

Berretty, R.-P., Peters, F., and Volleberg, G. (2006). Real-

time rendering for multiview autostereoscopic dis-

plays. In Proc. SPIE, Vol. 6055, 60550N, Stereoscopic

Displays and Virtual Reality Systems, pages 208–219,

San Jose, CA, USA.

de Beeck, M. O., Wilinski, P., Fehn, C., Kauff, P., Ijssel-

steijn, W., Pollefeys, M., van Gool, L., Ofek, E., and

Sexton, I. (2002). Towards an optimized 3D broadcast

chain. In Proc. SPIE, Vol. 4864, Three-Dimensional

TV, Video, and Display., pages 42–50, Boston, MA,

USA.

de Haan, G. and Biezen, P. (1994). Sub-pixel motion es-

timation with 3D recursive search block matching.

Signal Processing: Image Communication, 6(3):229–

239.

de Haan, G. and Biezen, P. (1998). An efﬁcient true-motion

estimator using candidate vectors from a parametric

motion model. IEEE Trans. Circuits Syst. Video Tech-

nol., 8(1):85–91.

Ernst, F., Wilinski, P., and van Overveld, C. (2002). Dense

structure-from-motion: an approach based on segment

matching. In Proc. ECCV, LNCS 2531, pages 217–

231, Copenhagen, Denmark.

Falkenhagen, L. (1997). Block-based depth estimation

from image triples with unrestricted camera setup. In

Proc. IEEE Workshop on Multimedia Signal Process-

ing, pages 280–285, Princeton, New Jersey, USA.

Fehn, C. (2004). Depth-image-based rendering (DIBR)

compression and transmission for a new approach on

3D-TV. In Proc. SPIE, Vol. 5291, Stereoscopic Dis-

plays and Applications, pages 93–104, San Jose, CA,

USA.

Fehn, C., Kauff, P., de Beeck, M. O., Ernst, F., IJsselsteijn,

W., Pollefeys, M., van Gool, L., Ofek, E., and Sexton,

I. (2002). An evolutionary and optimised approach on

3D-TV. In Proc. IBC’02, pages 357–365, Amsterdam,

The Netherlands.

Frederix, G. (2005). Beyond Gaussian Mixture Models: Un-

supervised Learning with applications to Image Anal-

ysis. PhD thesis, Katholieke Universiteit of Leuven,

Belgium.

Hartley, R. and Zisserman, A. (2001). Multiple View Geom-

etry in computer vision. Cambridge University Press,

Cambridge, UK.

Huang, T. and Netravali, A. (1994). Motion and structure

from feature correspondences: A review. Proc. IEEE,

82(2):252–268.

Marazzi, A. (1987). Subroutines for robust estimation of lo-

cation and scale in ROBETH. Technical Report Cah.

Rech. Doc. IUMSP, No. 3 ROB 1, Institut Univer-

sitaire de Medecine Sociale et Preventive, Lausanne,

Switzerland.

Pollefeys, M. (1999). Self-Calibration and Metric 3D

Reconstruction from Uncalibrated Image Sequences.

PhD thesis, Katholieke Universiteit of Leuven, Bel-

gium.

Pollefeys, M., Koch, R., and van Gool, L. (1998). Self-

calibration and metric reconstruction in spite of vary-

ing and unknown internal camera parameters. In Proc.

ICCV, pages 90–95, Bombay, India.

Pollefeys, M., Koch, R., Vergauwen, M., Deknuydt, B.,

and van Gool, L. (2000). Three-dimensional scene

reconstruction from images. In Proc. SPIE, Vol. 3958,

Electronic Imaging, Three-Dimensional Image Cap-

ture and Applications III, pages 215–226, San Jose,

CA, USA.

Redert, P., Berretty, R.-P., Varekamp, C., van Geest, B.,

Bruijns, J., Braspenning, R., and Wei, Q. (2007).

Challenges in 3DTV image processing. In Proc. of

SPIE, Vol. 6508, Visual Communications and Image

Processing 2007, San Jose, CA, USA.

Silverman, B. (1986). Density Estimation for Statistics and

Data Analysis. Monographs on Statistics and Applied

Probability. Chapman & Hall, London, UK.

Xu, G. and Zhang, Z. (1996). Epipolar Geometry in Stereo,

Motion and Object Recognition. A Uniﬁed Approach.

Kluwer Academic Publishers, Dordrecht, The Nether-

lands.

Zhang, Z., Deriche, R., Faugeras, O., and Luong, Q.-T.

(1995). A robust technique for matching two uncal-

ibrated images through the recovery of the unknown

epipolar geometry. Artiﬁcial Intelligence Journal,

78(1-2):87–119.

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

334