Jointly Optical Flow and Occlusion Estimation for Images with Large

Displacements

Vanel Lazcano

, Luis Garrido

and Coloma Ballester

NMFE, Universidad Mayor, Avda. Manuel Montt 318, Santiago, Chile

DMI, Universitat de Barcelona, Gran Via 585, Barcelona, Spain

DTIC, Universitat Pompeu Fabra, Roc Boronat 138, Barcelona, Spain

Keywords:

Optical Flow, Exhaustive Search, Large Displacements, Illumination Changes.

Abstract:

This paper deals with motion estimation of objects in a video sequence. This problem is known as optical

ﬂow estimation. Traditional models to estimate it fail in presence of occlusions and non-uniform illumination.

To tackle these problems we propose a variational model to jointly estimate optical ﬂow and occlusions.

The proposed model is able to deal with the usual drawback of variational methods in dealing with large

displacements of objects in the scene which are larger than the object itself. The addition of a term that

balances gradient and intensities increases the robustness to illumination changes of the proposed model. The

inclusion of a supplementary matching obtained by exhaustive search in speciﬁc locations helps to follow large

displacements.

1 INTRODUCTION

The apparent motion of pixels in a sequence of images

is usually called the optical ﬂow. Optical ﬂow compu-

tation is one of the most challenging problems in com-

puter vision, especially in real scenarios where occlu-

sions and illumination changes occur. Optical ﬂow

has many applications, including autonomous ﬂight

of vehicles, insertion of objects on video, video com-

pression and many others. In order to estimate this

ﬂow ﬁeld an energy model is stated, which compu-

tes the estimation error of the optical ﬂown. Most

of the optical ﬂow methods are grounded on the op-

tical ﬂow constraint. This constraint is based on the

brightness constancy assumption which states that the

brightness or intensity of pixels in the image remains

constant from frame to frame along the movement of

objects. The optical ﬂow constraint is only suitable

when the motion ﬁeld is small enough or images are

very smooth.

Solving the intensity constraint is an ill-posed pro-

blem which is usually solved by adding a regularity

prior. Then the regularity prior or regularization term

added to the energy model allows deﬁning the struc-

ture of the motion ﬁeld and ensures that the optical

ﬂow computation is well posed.

In (Horn and Schunck, 1981) was proposed to

add to the energy model a quadratic regularization

term. Actually, the work of (Horn and Schunck, 1981)

was the ﬁrst one which introduced variational met-

hods to compute dense optical ﬂow. However, the

Horn-Schunck model does not cope well with mo-

tion discontinuities, is highly sensible to noise in the

images. To tackle those drawbacks other regulariza-

tion terms have been proposed, (Nagel and Ekelman,

1986; Black and Ananda, 1996; Brox et al., 2004;

Zach et al., 2007; Werlberger et al., 2009; Sun et al.,

2010; Werlberger et al., 2010; Kr

ahenb

uhl and Kol-

tun., 2012; Xu et al., 2012; Chen et al., 2013; S

anchez

et al., 2014; Zimmer et al., 2011; Strekalovskiy et al.,

2014; Palomares et al., 2015; Ranftl et al., 2014; Sun

et al., 2014). In order to cope with large displace-

ments, optimization typically proceeds in a coarse-to-

ﬁne manner (also called a multi-scale strategy).

Optical ﬂow estimation using models based on

classical variational models fails if the sequence pre-

sents: i) occluded pixels, ii) displacements larger than

the size of the objects and iii) changes of illumination.

Occlusions produce lack of correspondence between

some points in the image sequence. Occluded pixels

include pixels of an image frame which are covered

by the movement of objects in the following frame.

For those occluded pixels, there is no a reliable optical

ﬂow. In particular, the brightness constancy assump-

tion is ﬂawed in realistic scenarios, where occlusions

occur due to the relative motion between objects in

588

Lazcano, V., Garrido, L. and Ballester, C.

Jointly Optical Flow and Occlusion Estimation for Images with Large Displacements.

DOI: 10.5220/0006716305880595

In Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2018) - Volume 5: VISAPP, pages

588-595

ISBN: 978-989-758-290-5

the scene or the camera movement, as well as illumi-

nation changes. Indeed, shadows or light reﬂections

that appear and move in the image sequence can also

make the brightness constancy assumption to fail.

These facts motivate us to consider an alternative

to the classical brightness constancy constraint, also

consider occlusion estimation and a new term to cope

with large displacements. In this paper we extend the

model in (Ballester et al., 2012) to a model which is

robust to illumination changes and is able to handle

large displacements.

2 RELATED WORKS

In (Zach et al., 2007) the authors present an approach

to estimate the optical ﬂow that preserves discontinui-

ties and it is robust to noise. In order to compute the

optical ﬂow u = (u

) : Ω → R

between I

and I

the authors propose to minimize the energy

E(u) =

Ω

(λ|I

(x) − I

(x + u)| + |∇u

| + |∇u

|)dx,

(1)

with a relative weight given by the parameter λ > 0.

This variational model is usually called the TV-L1 for-

mulation.

Occlusion is a challenging problem in the estima-

tion of the optical ﬂow. Some methods implicitly deal

with occlusion by using robust norms terms in the

data term while others do an explicit occlusion hand-

ling. A ﬁrst step towards taking into account occlusi-

ons was done by jointly estimating forward and bac-

kwards optical ﬂow in (Alvarez et al., 2007). Authors

argue that at non-occluded pixels forward and bac-

kward ﬂows are symmetric. This idea was taken by

the authors in (Ince and Konrad., 2008) and they pro-

posed to extrapolate optical ﬂow in occluded areas.

In (Xu et al., 2012) a method to estimate occlusions

is used. They consider the fact that multiple points

mapped by the optical ﬂow to the same point in the

following frame (collision) are likely to be occluded.

On the other hand, robustness against illumina-

tion changes would be desirable. The gradient of the

image is robust to additive illumination changes in

images (Brox et al., 2004), and therefore the gradient

constancy assumption:

∇I

(x) − ∇I

(x + u(x)) = 0,

may well be included as a new data term in a variatio-

nal energy in order to compute the optical ﬂow u (Xu

et al., 2012).

While traditional methodology works well in ca-

ses where small structures move more or less the same

way as larger scale structures, the approach fails with

large displacement. In recent years this topic has been

tackled in interesting approaches. In (Brox et al.,

2009), a method for large displacements is proposed

that performs region-based descriptor matching. This

method estimates correctly large displacement but it

can match outliers. (Steinbruecker and Pock, 2009)

also proposes a method in order to tackle large dis-

placement. The methodology performs well in real

images with large displacements but it presents a lack

of subpixel accuracy.

To tackle large displacement (Xu et al., 2012) in-

corporates matching of SIFT features computed be-

tween images of the sequence. The fusion between

matching of SIFT features and optical ﬂow estimation

is performed using graph cuts.

Recently new models have been proposed in order

to handle large displacements in (Weinzaepfel et al.,

2013), (Timofte and Van Gool, 2015) , (Kennedy and

Taylor, 2015), (Fortun et al., 2016) and (Palomares

et al., 2017). These models consider sparse or dense

matching using Deep matching algorithm (Weinzaep-

fel et al., 2013) or motion candidates (Fortun et al.,

2016). The principal idea is to give ”hints” to the va-

riational optical ﬂow approach by using these sparse

matching (Weinzaepfel et al., 2013). In (Kennedy and

Taylor, 2015) and (Fortun et al., 2016) the occlusion

layer is also estimated.

3 PROPOSED MODEL

We propose a variational model for joint optical

ﬂow and occlusion estimation, which considers color

image sequences and is able to handle illumination

changes as well as large displacements. The ingre-

dients are detailed in the following sections.

3.1 Occlusion Estimation

Inspired by (1) and (Ballester et al., 2012) we present

a joint optical ﬂow and occlusions estimation model.

The divergence of the motion ﬁeld can be used to

distinguish between different types of motion areas:

the divergence of a ﬂow ﬁeld is negative for occluded

areas, positive for dis-occluded, and near zero for the

matched areas.

Our model considers three consecutive color frames

−1

: Ω → R

as (Ballester et al., 2012), which

we assume to have values in the RBG color space,

hence each frame I

has three color components I

, I

, associated to the red, green and blue channels,

respectively. In order to compute the optical ﬂow bet-

ween I

let χ : Ω → [0, 1] be the function modeling

Jointly Optical Flow and Occlusion Estimation for Images with Large Displacements

589

the occlusion mask, so that χ = 1 identiﬁes the occlu-

ded pixels, i.e. pixels that are visible in I

but not in I

Our model is based on the assumptions: (i) pixels that

are not visible in frame I

are visible in the previous

frame of I

(let I

−1

be that frame), (ii) the occluded

region given by χ = 1 should be correlated with the

region where div(u) is negative, and (iii) motion of

the occluded background area is not fast as the one

of the occluding foreground (Ballester et al., 2012).

Thus, we propose to compute the optical ﬂow and the

occlusion mask by minimizing the energy:

(u,χ) = E

(u,χ)+E

(u,χ)+

Ω

χ|u|

+ β

Ω

χdiv(u)dx,

where E

(u.χ) and E

(u,χ) are given by

(u,χ) =λ

∑

k=1

Ω

((1 − χ)|I

(x) − I

(x + u(x))|

+ λ

∑

k=1

Ω

χ|I

(x) − I

−1

(x − u(x))|)dx.

(u,χ) =

Ω

g(x)(

∇u

∇χ

)dx, (2)

with η ≥ 0, β > 0 and g(x) =

1+γ

∇I

(x)

, x ∈ Ω, γ >

0. Notice that, if χ(x) = 0, then we compare I

(x)

and I

(x + u(x)). If χ(x) = 1, we compare I

(x) and

−1

(x − u(x)).

3.2 Robustness to Color Changes

The color constancy assumption is frequently viola-

ted due to illumination changes, shadows or reﬂecti-

ons. A combination of the color constancy assump-

tion and the gradient constancy assumption in the data

term seems to be a valuable approach to alleviate this

problem (Xu et al., 2012). We extend our color model

to consider a combination of intensities and gradients

by introducing an adaptive weight map α : Ω → [0,1]

that allows to balance in an adaptive way the contri-

bution of color and gradient constraints at each point

in the image domain (Xu et al., 2012). We propose

the following model:

(u,χ) = E

d,α

(u,χ)+ E

(u,χ)+

Ω

χ|u|

dx + β

Ω

χdiv(u)dx,

where E

d,α

(u,χ) can be written as:

d,α

(u,χ) =

Ω

α(x)D

I,χ

(u,χ,x)

Ω

(1 − α(x))D

∇I,χ

(u,χ,x)dx,

(3)

and D

I,χ

(u,χ,x) and D

∇I,χ

(u,χ,x) are point-wise data

costs based on the comparison of color and gradient

of the image, respectively. Roughly speaking, D

I,χ

contains the comparison kI

(x)−I

(x+u)k and D

∇I,χ

the comparison τk∇I

(x) − ∇I

(x + u)k, with τ > 0.

Then, the weight map α(x) is deﬁned in (Xu et al.,

2012) as

α(x) =

1 + e

β(D

I,χ

(u,x)−D

∇I,χ

(u,x))

, (4)

where

β is a positive constant. Let us com-

ment about the behavior of (4). If the term

I,χ

(u,x)  D

∇I,χ

(u,x), the difference D

I,χ

(u,x) −

∇I,χ

(u,x) will be positive and the exponential value

β(D

I,χ

(u,x)−D

∇I,χ

(u,x))

will be large. Then, α(x) will be

a small value, say near 0, and the data term will have

more conﬁdence on the gradient constancy assump-

tion. On the other hand, if D

∇I,χ

(u,x)  D

I,χ

(u,x),

the difference D

I,χ

(u,x) − D

∇I,χ

(u,x) will be negative

and the exponential value e

β(D

I,χ

(u,x)−D

∇I,χ

(u,x))

will be

very small. In other words, the data term will be more

conﬁdent on the color constancy assumption.

3.3 Large Displacements

To handle large displacements we add to our model a

term µ

Ω

c(x)

u − u

, where u

is an optical ﬂow

obtained by exhaustive search, χ

is a characteris-

tic function indicating location where supplementary

matching could improve the motion estimation, c(x)

is a conﬁdence on the exhaustive matching at x and

µ > 0.

Summarising, the proposed model to handle large

displacement is:

αl

(u,χ) = E

d,α

(u,χ) + E

(u,χ)+

Ω

χ|u|

dx + β

Ω

χdiv(u)dx+

Ω

c(x)

u − u

dx, (5)

In our implementation and for efﬁciency reasons, we

consider an upper bound for the expected maximum

displacement v

max

3.3.1 Conﬁdence Function c(x)

We directly integrate exhaustive point corresponden-

ces into the variational model and the proposed conﬁ-

dence measure, used to determine the weight given to

matching computed by exhaustive search, is

c(x) =



− d





dα

(u,x)

exha

,x)



VISAPP 2018 - International Conference on Computer Vision Theory and Applications

590

where d

, d

are the distances to the ﬁrst and second

best candidate respectively of the exhaustive search,

dα

(u,x) is the error deﬁned in (3) and E

exha

,x) is

the error of the exhaustive search. This measure was

used in (Stoll et al., 2012) to validate the correctness

of a given optical ﬂow ﬁeld at each point.

3.3.2 Construction of χ

In order to determine speciﬁc locations where supple-

mentary matching could improve the motion estima-

tion, we evaluate the data term, at each x ∈ Ω with

the computed ﬂow u and the occlusion map χ. The

idea is that if the value E

d,α

(u,χ) is large, then the

estimation might be improved. Additionally, we con-

sider the smaller eigenvalue λ(x) of the structure ten-

sor associated to the image I

. With these ingredients,

the set Ω

where supplementary matching could im-

prove motion estimation is deﬁned as:

Ω

{

x ∈ Ω

dα

(u,χ)(x) > θ

∧ λ(x) > θ

}

where θ

and θ

are given constants which we will

determine empirically and ﬁx for the experiments we

performed. That is, if E

d,α

> θ

, then we assume

that the error is large enough to be improved using

a supplementary match. The set of points that be-

long to Ω

deﬁne a binary mask, which we denote

by χ

: Ω → [0, 1].

3.4 Solving the Model

In order to minimize (5), we relax it and introduce

ﬁve auxiliary variables v

representing

the ﬂow and used to decouple the nonlinear terms,

where v

, v

correspond to the red, green and blue

channels, respectively and v

, v

correspond to ∂

and

∂

respectively. We penalize the difference between

the optical ﬂow u and each of the auxiliary variables

, v

. Thus, to compute the occlusions and

the optical ﬂow between I

, we propose to mini-

mize the following energy:

α,l

(u,χ, ˜v) = E

( ˜v,χ) + E

(u,χ)+

Ω

˜v

dx +β

Ω

χdiv(u)dx+

2θ

Ω

˜u − ˜v

dx,

(6)

where

˜v

stands for

∑

k=1

and

( ˜v,χ) =

Ω

(1−χ)

∑

k=1



)



dx+λ

Ω

∑

k=1



−1

)



dx,

and ρ

is the linearized version of I

(x) − I

(x + ε

)

around an approximation u

of u, with i = −1,1 and

−1

= −1 and ε

= 1, and k = 1,2, 3 (corresponding

to each color channel). The linearization procedure is

applied to each ρ

(x).

We minimize

αl

in (6) by alternating among

the minimization with respect to each variable while

keeping the remaining ﬁxed as (Ballester et al.,

2012),(Zach et al., 2007). In particular, the minimi-

zation of

αl

with respect to u, v

and χ is described

in the following propositions.

Proposition 1. The minimum of

αl

with respect to

u = (u

) is given by

∑

k=1

+ θdiv(gξ

) + θβ

∂χ

∂x

+ µθu

1 + µθχ

, (7)

with i=1,2 and u

= (u

). ξ

and ξ

are computed

using the following iterative scheme

t+1

g∇(

∑

k=1

+ θdiv(gξ

) + θβ

∂χ

∂x

)

1 +

|g∇(

∑

k=1

+ θdiv(gξ

) + θβ

∂χ

∂x

(8)

where ξ

= 0 and τ

≤ 1/8.

Proposition 2. Assume that χ : Ω → {0,1}. The mi-

nimum of

α,l

with respect to v

= (v

) is











u−µ

α(x)∇I

∗

) if Λ

(u)>µ

α(x)m

u+µ

α(x)∇I

∗

) if Λ

(u)<−µ

α(x)m

u−ε

(u)

∇I

∗

)

|∇I

∗

if |Λ

(u)| ≤ µ

α(x)m

when i = 1 and ε

= 1, η

= 1, µ

= λθ, Λ

(u) =

(u) when χ = 0, and i = −1, ε

−1

= −1, η

−1

1+ηθ

, µ

−1

λθ

1+ηθ

, Λ

−1

(u) = ρ

−1

(u) +

ηθ

1+ηθ

u ·

∇I

−1

(x + ε

) when χ = 1. Additionally we create

∗

= x + ε

. The term m

was deﬁned as m

|∇I

∗

. Arguments x in u,u

are omitted.

Once all v

are computed, we deﬁne F = λA+

where

A =

−α(x)

∑

k=1



− (1 − α(x))

∑

k=4

)

B =

∑

k=1

− u)

and G = λC +

D, where,

C =

−α(x)

∑

k=1



−1



− (1 − α(x))

∑

k=4

−1

)

D =

∑

k=1

)

Jointly Optical Flow and Occlusion Estimation for Images with Large Displacements

591

Proposition 3. Let 0 < τ

< 1/8. Given u,v, the

minimum

χ of

with respect to χ can be obtained by

the following primal-dual algorithm

n+1

= P

(ψ

+ τ

g ∇χ

)

n+1

= P

[0,1]



+ τ



div(gψ

n+1

) − β divu − F − G



where P

(ψ) denotes the projection of ψ on the unit

ball of R

and P

[0,1]

(r) = max(min(r,1),0), r ∈ R.

3.5 Algorithm

This section is devoted to present the numerical algo-

rithm for the minimization of (6), including pseudo-

codes describing its main steps. In particular, Al-

gorithm 1 summarizes our illumination changes and

large displacement robust optical ﬂow model presen-

ted in section 3.4. The value of α(x), for all x ∈ Ω

are updated after each propagation of the optical ﬂow

to the ﬁner scale, before starting the estimation of the

ﬂow ﬁeld at that scale.

The data attachment

c(x)χ

(u−u

)

depends on

the conﬁdence value c(x), the mask χ

and exhaustive

matchings u

. The conﬁdence value is an estimation

of the reliability of the exhaustive matchings.

4 DATABASE AND

EXPERIMENTS

We evaluate our model in two publicly databases:

Middlebury (Scharstein and Szeliski, 2002) and MPI

Sintel (Butler et al., 2012). In Figure 1, we show ima-

ges of the Middlebury dataset. These sequences con-

tain displacements larger than the size of the object

and also contain shadows and reﬂections. Figure 1

shows three consecutive frames of the sequence Be-

anbags(BB) and DogDance(DD). BB sequence pre-

sents balls that move while producing shadows on the

T-shirt. In DD sequence the girl moves to the right

and the dog moves to the left.

MPI database (Butler et al., 2012) presents long

synthetic sequences containing large displacements,

blur or reﬂections, fog and shadows. Moreover, there

are two versions of the MPI database: clean and ﬁnal.

The ﬁnal version is claimed to be more challenging

and we take it for our evaluation. Figure 2 displays

some examples of the MPI database. There are ima-

ges with large displacements. In the cave 4 sequence

a girl ﬁght with a dragon moving her lance inside a

cave in (a), (b), (c). In (d), (e) and (f) the girl moves

downward a fruit on her hand.

Input : Three consecutive color frames

−1

and u

Output: Flow ﬁeld u and occlusion layer χ

for I

, and α(x)

Compute down-scaled images I

−1

for

s = 1,. .., N

scales

;

Initialize u

scales

= v

scales

= 0, and

scales

= 0, α

scales

(x) = 1.0, γ = 0;

for s ← N

scales

to 1 do

Compute α

(x) using (4);

for w ← 1 to N

warps

Compute I

(x + ε

(x)),

∇I

(x + ε

(x)), and ρ

, i = −1,1;

n ← 0;

while n < outer

iterations do

Compute v

using Proposition 2;

for l ← 1 to inner iterations u do

Solve for ξ

l+1,s

, i ∈ {1, 2},

using the ﬁxed point

iteration (Proposition 1);

end

Compute u

using Proposition 1

considering data attachment

c(x)χ

(u − u

);

for m ← 1 to inner iterations χ

Solve for χ

m+1

using the

primal-dual algorithm

(Proposition 3);

end

Compute E

dα

(x), λ(x);

Compute χ

dα

(x),λ(x),θ

,θ

) implies

Ω

;

If s > 1 then scale-up u

,χ

s−1

,χ

s−1

;

end

u = u

and χ = T

(χ

)

Algorithm 1: Algorithm for illumination changes and

large displacement robust optical ﬂow.

5 RESULTS

For all experiments parameters are ﬁxed to: θ = 0.40,

λ = 0.60, α = 0.0, β = 1.0, θ

λ1

= 0.98 and θ

= 0.98.

The µ decreased its value in each iteration with initial

value µ

= 300 and µ

= (0.6)

in the following

iterations. For real images we use blocks of 7 × 7

pixels and for synthetic images we use blocks of 31 ×

31 pixels.

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

592

(a) (b) (c)

(d) (e) (f)

Figure 1: Middlebury BB video containing large displace-

ments, illumination changes, shadows that moves in scene.

(a) frame9, (b) frame10, (c) frame11 of the BB sequence.

(d) frame09, (e) frame10 and (f) frame11 of the DD se-

quence.

(a) (b) (c)

(d) (e) (f)

Figure 2: Images of the MPI database. (a) frame13, (b)

frame14 and (c) frame15 of cave 4 sequence. (d) frame19,

(e) frame20 and (f) frame21 of Alley 4 sequence.

Figure 3 presents the obtained results: (a) color

coded estimated optical ﬂow for BB, (b) occluded re-

gions (let us observe how they are correctly estimated,

in particular on the face of the man), (c) χ

for BB (d)

the compensated image. (e) color coding scheme. (f)

encoded optical ﬂow for DD, (g) the estimated occlu-

sion (notice that the occlusion appears in the right side

of the girl and in the left side of the dog), (h) χ

for

DD.

We have divided MPI database in three subsets:

large, medium and small displacements. The quanti-

tative obtained results are shown in Table 1. For large

displacements we set the parameter v

max

= 150, for

medium displacements we set v

max

= 40 and for small

displacement we set v

max

= 1. For large displacement

we set θ

= 0.50 and θ

= 0.50, for medium and small

displacement we set θ

= 0.98 and θ

= 0.98. The

Average End Point Error for the whole database is

presented in Table 1.

Table 1: End Point Error obtained by our model in subset:

large displacement, medium displacement and small displa-

cement of MPI.

Large Medium Small

EPE 18.82 EPE 1.41 EPE 0.80

Total Average EPE 7.17

Let us observe from Table 1 that although the

obtained average is EPE = 18.82 in Large Displace-

(a) (b) (c)

(d) (e)

(f) (g) (h)

Figure 3: Results obtained in BB and DD sequence. (a)

color coded ﬂow ﬁeld obtained by our model. (b) estima-

ted occlusion mask. (c) χ

. (d) Compensated image using

the occlusion mask, Compensated = (1 − χ)I

(x + u(x)) +

χI

(x − u(x)) . (e) color code for ﬂow ﬁeld. (f) color code

for DogDance sequence. (g) estimated occlusion mask. (h)

Table 2: End Point Error obtained by our model in subset of

MPI considering displacement < 150 pixels.

Large Our

Displacement model DeepFlow MDP-Flow2

Average EPE 8.82 10.61 9.12

ment videos, the average EPE in all sequence drops to

7.17. If we only consider frames that contains displa-

cements less than 150 pixels the error drops to 8.82 in

Table 2. We also show in Table 2 results obtained by

DeepFlow in these subsets (Weinzaepfel et al., 2013)

and MDPOF (Xu et al., 2012).

In Figure 5 and Figure 4 we show qualitative re-

sults obtained for MPI data base.

In Figure 4 we have computed the optical ﬂow be-

tween frame 27 and frame 28 of the sequence am-

bush 7, by considering three frames: frame 26 which

is considered to be I

−1

in our energy model, frame 27

which is I

and frame 27 is I

. Results are shown

in Figure 4. Original frames 26, 27 and 28 corre-

spond to subﬁgures (a), (c) and (e), respectively. This

sequence presents small displacement but there is a

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 4: Results obtained by our method in ambush 7 vi-

deo sequence. (a), (b) and (c): frame 26, 27 and 28, re-

spectively. (b) Optical ﬂow ground truth. (f) Estimated op-

tical ﬂow. (f) λ(x). (g) α(x). (h) χ(x).

Jointly Optical Flow and Occlusion Estimation for Images with Large Displacements

593

(a) (b) (c)

(d) (e) (f)

Figure 5: Images of the MPI video database. (a) color co-

ded optical ﬂow , (b) χ

and (c) estimated occlusions cave4

sequence. (d) color coded optical ﬂow, (e) estimated occlu-

sion and (f) ground truth occlusion.

shadow that moves. The lance in the image presents

small texture. This texture in the sequence presents

small variations (like noise) frame to frame. In (d)

we show the estimated optical ﬂow. We observe that

the optical ﬂow was robustly estimated on the snow

(where the shadow moves). In (f) we show the mini-

mum eigenvalue of the structure tensor of the frame

27. We observe that the structure lay on the small de-

tails of the lance. In (g) we show the adaptive balance

term α(x). This show that on the snow region the co-

lor constancy constrains does not holds and gradients

should be used (α(x) = 0). On the other hand where

α(x) = 1.0 intensity should be used. In (h) we have

estimated occlusions on the texture of the lace due to

this small variation frame to frame.

In Figure 5 our estimated optical ﬂow is displayed

in (a). In (b) we show χ

indicating the positions

where the exhaustive search is incorporated. (c) pre-

sents the estimated occlusion layer. (d) color coded

optical ﬂow for Alley1 sequence. (e) Estimated occlu-

sion layer. (f) ground truth occlusion layer. Compa-

ring (e) and (f) we see that they are very similar.

Figure 6 shows a comparison on MPI Sintel da-

tabase. These results are available in the Sintel web-

site (Butler et al., 2012). Our proposal is denoted as

OF OCC LD. Notice that for s0-10 our model is ran-

ked 20 (in brakets) of 110 reported method in the MPI

site. For s10-40 our method is ranked 41. Finally for

EPEall our method is ranked 97 outperforming TV-L1

which is ranked 102.

5.1 Critical Discussion

MPI test set includes small, medium and large dis-

placements (approx. 400 pixels). For small and me-

dium displacements, our method is ranked 20 and 41,

respectively. For large displacements, the position

drops to 97 which may well be due to the fact that,

for efﬁciency reasons, in our experiments the large

displacement threshold v

max

(which should be at least

400) was set to 150.

Figure 6: Comparative results obtained by our method in

MPI test set. EPEall is Endpoint error over the complete

frames, s0-10 error over regions with displacements lower

than 10 pixels, s0-40 error over regions with displacements

between 10 and 40 pixels.

6 CONCLUSIONS

We proposed a variational model to jointly estimate

the optical ﬂow and the occlusion layer incorporating

the occlusion information in its energy based on the

divergence of the ﬂow. The optical ﬂow on visible

pixels is forward estimated and while it is backwards

estimated on occlude pixels, from three consecutive

frames. The proposed robust model handles illumi-

nations changes using a balance term between gra-

dients and intensities improving the performance of

the optical ﬂow estimation in scenarios with illumi-

nation changes. Thanks to the use of supplementary

matches the model is able to capture large displace-

ments, even of small objects. As future work we plan

to accelerate the exhaustive matching computation in

order to avoid the v

max

restriction and handle arbitrary

largest displacements. It also includes a parallel com-

putation and GPU implementation.

REFERENCES

Alvarez, L., Deriche, R., Papadopoulo, T., and Sanchez, J.

(2007). Symmetrical dense optical ﬂow estimation

with occlusions detection. International Journal of

Computer Vision, 75(3 ):371–385.

Ballester, C., Garrido, L., Lazcano, V., and Caselles, V.

(2012). A tv-l1 optical ﬂow method with occlusion

detection. In DAGM/OAGM, LNCS 7476. Springer

Verlag.

Black, M. J. and Ananda, P. (1996). The robust estima-

tion of multiple motions: Parametric and piecewise-

smooth ﬂow ﬁelds. Computer vision and image un-

derstanding, 63(1):75–104.

Brox, T., Bregler, C., and Malik, J. (2009). Large displace-

ment optical ﬂow. In Proceedings of the IEEE Com-

puter Vision and Pattern Recognition, pages 500–513.

Brox, T., Bruhn, A., Papenberg, N., and Weickert, J. (2004).

High accuracy optical ﬂow estimation based on a the-

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

594

ory for warping. In European Conference on Compu-

ter Vision (ECCV), volume 3024, pages 25–36. Lec-

ture Notes in Computer Science.

Butler, D. J., Wulff, J., Stanley, G. B., and Black, M. J.

(2012). A naturalistic open source movie for optical

ﬂow evaluation. In A. Fitzgibbon et al. (Eds.), editor,

European Conf. on Computer Vision (ECCV), Part IV,

LNCS 7577, pages 611–625. Springer-Verlag.

Chen, Z., Jin, H., Lin, Z., Cohen, S., and Wu, Y. (2013).

Large displacement optical ﬂow from nearest neig-

hbor ﬁelds. In IEEE Conference on Computer Vision

and Pattern Recognition (CVPR), pages 2443–2450.

Fortun, D., Bouthemy, P., and Kervrann, C. (2016). Aggre-

gation of local parametric candidates with exemplar-

based occlusion hangling for optical ﬂow. Computer

Vision and Image Undestanding, pages 81–94.

Horn, B. K. and Schunck, B. H. (1981). Determining optical

ﬂow. Artiﬁcial Intelligence, 17:185–203.

Ince, S. and Konrad., J. (2008). Occlusion-aware optical

ﬂow estimation. IEEE Transactions on Image Proces-

sing, 17(8):1443–1451.

Kennedy, R. and Taylor, C. J. (2015). Optical ﬂow with geo-

metric occlusion estimation and fusion of multiple fra-

mes. In EMMCVPR 2015, Hong Kong, China, Janu-

ary 13-16, 2015. Proceedings, pages 364–377. Sprin-

ger International Publishing.

ahenb

uhl, P. and Koltun., V. (2012). Efﬁcient nonlocal

regularization for optical ﬂow. In European Confe-

rence on Computer Vision (ECCV), pages 356–369.

Springer.

Nagel, H.-H. and Ekelman, W. (1986). An investigation

of smoothness constraints for the estimation of displa-

cement vector ﬁelds from image sequences. Pattern

Analysis and Machine Intelligence, 6(5):565–593.

Palomares, R. P., Haro, G., and Ballester, C. (2015).

A rotation-invariant regularization term for optical

ﬂow related problems. Lectures Notes in Computer

Science, 9007:304–319.

Palomares, R. P., Meinhardt-Llopis, E., Ballester, C., and

Haro, G. (2017). Faldoi: A new minimization strategy

for large displacement variational optical ﬂow. Jour-

nal of Mathematical Imaging and Vision, 58(1):27–

46.

Ranftl, R., Bredies, K., and Pock, T. (2014). Non-local total

generalized variation for optical ﬂow estimation. In

Computer Vision–ECCV, pages 439–454. Springer.

anchez, J., Salgado, A., and Monz

on, N. (2014). Preser-

ving accurate motion contours with reliable parameter

selection. In IEEE International Conference on Image

Processing (ICIP), pages 209–213.

Scharstein, S. and Szeliski, R. (2002). A taxonomy and

evaluation of dense two-frame stereo correspondence

algorithms. International journal of computer vision,

47 :7–42.

Steinbruecker, F. and Pock, T. (2009). Large displacement

optical ﬂow computation without warping. In Interna-

tional Conference on Computer Vision, pages 1609–

1614.

Stoll, M., Volz, S., and Bruhn., A. (2012). Adaptive in-

tegration of features matches into variational optical

ﬂow methods. In Proc. of the Asian Conference in

Computer Vision (ACCV), pages 1–14.

Strekalovskiy, E., Chambolle, A., and Cremers, D. (2014).

Convex relaxation of vectorial problems with coupled

regularization. SIAM J. Imaging Sciences, 7(1 ):294–

336.

Sun, D., Roth, S., and Black, M. J. (2010). Secrets of opti-

cal ﬂow estimation and their principles. In IEEE Con-

ference on Computer Vision and Pattern Recognition,

pages 2432–2439.

Sun, D., Roth, S., and Black, M. J. (2014). A quantitative

analysis of current practices in optical ﬂow estimation

and the principles behind them. International Journal

of Computer Vision, 2 (106):115–137.

Timofte, R. and Van Gool, L. (2015). Sparse ﬂow: Sparse

matching for small to large displacement optical ﬂow.

IEEE WACV, 00:1100–1106.

Weinzaepfel, P., Revaud, J., Harchaoui, Z., and Schmid, C.

(2013). Deepﬂow: Large displacement optical ﬂow

with deep matching. In IEEE International Confe-

rence on Computer Vision, Sydney, Australia, pages

1385–1392.

Werlberger, M., Pock, T., and Bischof, H. (2010). Mo-

tion estimation with non-local total variation regula-

rization. In IEEE Conference on Computer Vision and

Pattern Recognition(CVPR), pages 2464–2471.

Werlberger, M., Trobin, W., Pock, T., Wedel, A., Cremers,

D., and Bischof, H. (2009). Anisotropic huber-l1 op-

tical ﬂow. In Proceedings of the BMVC.

Xu, L., Jia, J., and Matsushita, Y. (2012). Motion detail

preserving optical ﬂow estimation. In Pattern Analy-

sis and Machine Intelligence, IEEE Transactions on,

volume 34, pages 1744–1757.

Zach, C., Pock, T., and Bischof, H. (2007). A duality

based approach for realtime tv-l1 optical ﬂow. In

Proceedings of the 29th DAGM Conference on Pat-

tern Recognition, pages 214–223. Berlin, Heidelberg,

Springer-Verlag.

Zimmer, H., Bruhn, A., and J., W. (2011). Optic ﬂow in

harmony. International Journal of Computer Vision,

93(3):368–388.

Jointly Optical Flow and Occlusion Estimation for Images with Large Displacements

595