Consistent Optical Flow Maps for

Full and Micro Facial Expression Recognition

Benjamin Allaert, Ioan Marius Bilasco and Chabane Djeraba

Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL -

Centre de Recherche en Informatique Signal et Automatique de Lille, F-59000 Lille, France

benjamin.allaert@ed.univ-lille1.fr, {marius.bilasco, chabane.djeraba}@univ-lille1.fr

Keywords:

Facial expression, Micro-expression, Optical Flow.

Abstract:

A wide variety of face models have been used in the recognition of full or micro facial expressions in image se-

quences. However, the existing methods only address one family of expression at a time, as micro-expressions

are quite different from full-expressions in terms of facial movement amplitude and/or texture changes. In this

paper we address the detection of micro and full-expression with a common facial model characterizing facial

movements by means of consistent Optical Flow estimation. Optical Flow extracted from the face is generally

noisy and without speciﬁc processing it can hardly cope with expression recognition requirements especially

for micro-expressions. Direction and magnitude statistical proﬁles are jointly analyzed in order to ﬁlter out

noise and obtain and feed consistent Optical Flows in a face motion model framework. Experiments on CK+

and CASME2 facial expression databases for full and micro expression recognition show the beneﬁts brought

by the proposed approach in the ﬁled of facial expression recognition.

1 INTRODUCTION

Automatic facial expression analysis has attracted

great interest over the past decade in various domains.

Facial expression recognition has been widely studied

in computer vision. Recent methodologies for static

expression recognition have been proposed and ob-

tain good results for acted expression. However, in

order to cope with the natural context challenges like

face occlusions, non-frontal poses, expression inten-

sity and amplitude variations must be addressed.

Challenges like illumination variation, face occlu-

sions, non-frontal poses have been addressed in ﬁelds

other then expression recognition. Several research

results were also published on this topic primarily

based on face alignment. Although the methodology

is more mature, it is far from being fully robust. This

topic attracts still many researches and discussions.

In the following we focus on challenges brought

by supporting a wide range of facial movement am-

plitudes when producing a full or micro expression.

In case of full expression the underlying facial move-

ment and the induced texture deformation can be

clearly differentiated from the noise that can appear

when analyzing the face properties. However, as the

amplitudes are much smaller in micro-expressions at-

tention must be paid to small changes encoding.

Automatic micro-expression recognition algo-

rithms have recently received growing attention in the

literature (Yan et al., 2014; Liu et al., 2015; Wang

et al., 2014a; Wang et al., 2014b). Micro-expressions

are quite different from full-expression recognition.

They are characterized by rapid facial movements

having low intensity. Micro-expressions typically in-

volve a fragment of the facial region. Therefore,

previous work that were suitable for full-expression

recognition may not work well for micro-expressions.

In other words, it seems difﬁcult to ﬁnd a common

methodology for analyzing full and micro expression

in an accurate manner.

Dynamic texture is an extension of texture char-

acterization to the temporal domain. Description and

recognition of dynamic textures in facial expression

recognition have attracted growing attention because

of their unknown spatial and temporal extent. Impres-

sive results have recently been achieved in dynamic

texture synthesis using the framework based on a sys-

tem identiﬁcation theory which estimates the parame-

ters of a stable dynamic model (Wang et al., 2014a;

Wang et al., 2014b). However, the recognition of

dynamic texture is a challenging problem compared

with the static case (P

eteri and Chetverikov, 2005).

Indeed, for real videos the stationary dynamic tex-

tures must be well-segmented in space and time and it

Allaert B., Bilasco I. and Djeraba C.

Consistent Optical Flow Maps for Full and Micro Facial Expression Recognition.

DOI: 10.5220/0006127402350242

In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017), pages 235-242

ISBN: 978-989-758-226-4

235

is difﬁcult to deﬁne a metric in the space of dynamic

models.

In facial expression recognition, Optical Flow

methods are popular as Optical Flow estimation is

a natural way to characterize the local dynamics of

a temporal texture (Fortun et al., 2015). The use of

Optical Flow reduces dynamic texture analysis to the

analysis of a sequence of instantaneous motion pat-

terns viewed as static textures. Optical Flow are re-

cently used to analyze full-expression (Liao et al.,

2013; Su et al., 2007; Lee and Chellappa, 2014) and

micro-expression (Liu et al., 2015). Good perfor-

mances were obtain in both cases. However, the us-

age of the Optical Flow is still questioned because

the accuracy drops in the presence of motion disconti-

nuities, large displacements or illumination changes.

Recent Optical Flow algorithms (Revaud et al., 2015;

Chen and Koltun, 2016; Bailer et al., 2015) evolved to

better deal with noise and motion discontinuities em-

ploying complex ﬁltering requiring high computation

time. Still, these algorithms were designed for generic

Optical Flow computations and are not adapted to fa-

cial morphology and physical constraints.

In this paper, we investigate the effectiveness of

using a facial dedicated ﬁltered dense Optical Flow

in order to recognize the full-expressions (anger, fear,

disgust, happiness, sadness, and surprise) and micro-

expressions (positive, negative, surprise) in near-

frontal-view recordings. In section 2, we discuss ex-

isting work related to static approaches for full and

micro expression recognition. In section 3, we present

our approach for extracting the coherent movement

on the face in different locations from dense Optical

Flow method. We ﬁlter the noise considering the fa-

cial movement hypothesis (local coherency and prop-

agation). Next, we explore the characterization of the

coherent Optical Flow into a facial model formula-

tion in section 4. Experimental results are discussed

in section 5. Finally, the conclusion and future per-

spectives are given in section 6.

2 RELATED WORK

Most of the 2D-feature-based methods are suitable for

the analysis of near frontal facial expressions in pres-

ence of limited head motions and intense expressions.

In order to provide the reader with an overview of ap-

proaches challenging these limitations, we present the

recent facial alignment method and how the approach

of facial expression recognition is adapted to the dif-

ferent intensity of expression.

The face is usually detected and aligned in order

to reduce variations in scale, rotation, and position.

Alignment based on eyes is the most popular strat-

egy since eyes are the most reliable facial compo-

nents to be detected and suffer little changes in pres-

ence of expressions. Assuming the face region is well

aligned, histogram-like features are often computed

from equal-sized facial grids. However, apparent mis-

alignment can be observed and it is primarily caused

by variations in face pose and facial deformation, as

well as the diversity in human face geometry. Re-

cent studies use the facial landmarks to deﬁne a facial

region that increase robustness to facial deformation

during expression. Jiang et al. (Jiang et al., 2014) de-

ﬁne a mesh over the whole face with an Active Shape

Model (ASM), and extract features from each of the

regions enclosed by the mesh. Han et al. (Han et al.,

2014) use an Active Apparent Model (AAM) to trans-

form a facial grid and improve feature extraction for

recognizing facial Action Units (AUs).

Thanks to recent databases (Yan et al., 2014;

Li et al., 2013), the demand for computer vision

techniques to improve the performance of micro-

expression recognition is increasing. Recent works

usually used spatiotemporal local binary pattern

(LBP) for micro-expression analysis (Wang et al.,

2014b; Wang et al., 2014a; Yan et al., 2014). Huang

et al. (Huang et al., 2016b) proposed spatiotempo-

ral completed local binary pattern (STCLQP) and ob-

tained promising performances with regard to sim-

ilar state-of-the-art methods. The reason may be

that STCLQP provides more useful information for

micro-expression recognition, as STCLQP extracts

jointly information characterizing magnitudes and

orientations. Recently, Liu et al. (Liu et al.,

2015) built a feature for micro-expression recognition

based on a robust Optical Flow method and extract a

Main Directional Mean Optical-ﬂow (MDMO). They

showed that the magnitude is more discriminant than

the direction when working with micro-expression

and they achieve better performance than spatiotem-

poral LBP approach.

Some approaches employ dense Optical Flow for

full expression recognition and perform well in sev-

eral databases. Su et al. (Su et al., 2007) propose to

uniformly distribute 84 feature points over the three

automatically located rectangles instead of extracting

precise facial features (eyebrows, eyes, mouth). They

select the facial regions which contribute more to-

wards the discrimination of expressions. Lee et al.

(Lee and Chellappa, 2014) design sparse localized fa-

cial motion dictionaries from dense motion ﬂow data

of facial expression image sequences. The proposed

localized dictionaries are effective for local facial mo-

tion description as well as global facial motion anal-

ysis. Liao et al. (Liao et al., 2013) improve the exist-

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

236

ing feature extraction result by learning expression-

speciﬁc spatial weighting masks. The learned spatial

weighted masks correspond to the human attention to

discriminate between expressive faces, and determine

the importance of facial regions. The weighted masks

can signiﬁcantly increase the performance of facial

expressions recognition and intensity estimation on

several databases.

Inspired by the success of simple dense Optical

Flow approach, we explore magnitude and direction

constraints in order to extract the relevant movement

on the face. Considering the smoothing of motion of

recent Optical Flow approach, a simple Optical Flow

combined with magnitude constraint seems adequate

for reducing the noise induced by lighting changes

and small head motions. In the next section, we pro-

pose a ﬁltering Optical Flow method based on consis-

tent local motion propagation to keep only the perti-

nent motion during facial expression.

3 FACIAL RELATED FILTERING

The facial characteristics (skin smoothness, skin re-

ﬂect and elasticity) involves dealing with the incon-

sistency and the noise induced by motion discontinu-

ities, as well as, illumination changes while extracting

directly the Optical Flow on the face.

Instead of explicitly computing the global Opti-

cal Flow ﬁeld, the Optical Flow constraint equation is

used in a speciﬁc facial area deﬁned in relation with

the facial action coding system in order to keep only

the pertinent motion of the face. The pertinent motion

is deﬁned as the Optical Flow extracted from regions

where the intensity of moving pixels reﬂects natural

facial movements characteristics. We consider a nat-

ural facial movement to be uniform during motion if

it is characterized by continuity over neighboring pix-

els, as well as, by continuous diminution of its in-

tensity over neighboring regions. The ﬁltering oper-

ation of Optical Flow is divided into several stages

and they are illustrated in Figure 1. The Farneb

ack

algorithm (Farneb

ack, 2003) is used to compute fast

dense Optical Flow (A). It is not the most accurate

algorithm but it ensures that motion is not disaggre-

gated by smoothing and the computation time is low.

Based on the Farneb

ack ﬂow ﬁeld, we determine the

consistent facial motion from the facial regions hav-

ing high probability of movement (RHPM) (B). Each

RHPM analyze their neighbors behavior in order to

estimate the propagation of the motion on the whole

face (C). The ﬁltered Optical Flow ﬁeld is computed

from the coherent motion in each RHPM (D).

Next, we present in detail our approach for ex-

Figure 1: All stages of the proposed method.

tracting the coherent movement in different locations

on the face from dense Optical Flow method by ﬁlter-

ing the noise on the basis of the facial movement hy-

pothesis assuming local coherency and propagation.

3.1 RHPM Local Coherency

In order to cope with the noise and to ﬁlter the Optical

Flow information, we start by analyzing the direction

distribution within each local region in order to keep

only the reliable ﬂow. The proposed method is illus-

trated in Figure 2.

Figure 2: The process of consistent local motion character-

ization in RHPM.

Consistent Optical Flow Maps for Full and Micro Facial Expression Recognition

237

Each region with a high probability of movement

contains local Optical Flow information for each pixel

: a direction and a magnitude. Each RHPM is deﬁned

by a center ε(x,y) called epicenter and a local propa-

gation value λ which deﬁne the size of the area under

investigation around the epicenter.

In order to measure the consistency of the Optical

Flow in terms of directions, we analyze the direction

distribution into the RHPM for several layers of mag-

nitude (Figure 2-A). We assume that the motion on

the face spread progressively due to the skin elastic-

ity. Furthermore, we have constructed 5 normalized

histograms (M = (M

,..., M

)) that represent the

direction distribution over 36 bins (of 10

◦

each) for

different magnitudes ranges. The magnitude ranges

vary according to the characteristics of the data to be

processed. We have kept only 5 magnitudes, since

they are sufﬁcient to reﬂect the consistency of move-

ment in facial motion.

Afterwards, the intersection of direction for each

pair of consecutive magnitudes is computed to esti-

mate motion overlap between two consecutive mag-

nitudes (Figure 2-B). We build a feature ρ, which

represent the intersection between two magnitude his-

tograms by

(

1, if M

> 0 and M

i+1

> 0.

0, otherwise.

(1)

where i = 1,2,...,5 is the index of magnitudes and k is

the number of bins. The vector ρ is composed only

of 0 (no match is found relevant to the bin k) and 1

(histograms have a common occurrence into the bin

k). To cope with the discretization problems where

close angles can be spread over different bins, we ex-

tend the direction distribution limits by one bin. If no

direction is found for all feature vector ρ, the RHPM

is considered as being locally incoherent. After ex-

tracting the occurrences feature vector ρ for each pair

of magnitudes, the union of all ρ vectors provide the

main directions.

The number of occurrences for each direction

within Ψ range is from 0 (low intensity) to 4 (high in-

tensity) and characterize the importance of each direc-

tion (Figure 2-C). If no common directions between

the four feature vector ρ are found, the RHPM is con-

sidered as being locally incoherent.

Despite the fact that a RHPM is considered as co-

herent, the ﬁltering of local motion has not yet been

completed. Indeed, if we consider a natural facial

movement to be uniform during motion then the lo-

cal facial motion should spread to other region neigh-

bors. The analysis of the movement propagation in

the RHPM neighborhood is explained further.

3.2 RHPM Neighborhood Propagation

Facial muscles action ensures that a local motion

spreads to neighboring regions until motion exhaus-

tion. Motion is subject to changes that could affect

direction and magnitude in any location. However,

intensity of moving facial region tends to remain con-

stant during facial expression. Therefore, a perti-

nent motion computed in a RHPM appears, eventu-

ally with a lower or upper intensity, in at least one

neighboring region.

Facial motion analysis consists in estimating the

motion propagation in the direct neighborhood of the

speciﬁc RHPM. We propose a method to ﬁnd the local

facial motions that best discriminate expressions and

corresponds to the regional importance of the expres-

sive faces. The propagation analysis is illustrated in

Figure 3. Next, we explain the process steps : how to

locate RHPM Neighboring regions (Figure 3-1); how

to calculate the consistency between two regions (Fig-

ure 3-2) and how to estimate the global consistent mo-

tion around the RHPM (Figure 3-3).

When an RHPM is locally coherent, we must ver-

ify that the motion has expanded into a neighboring

RHPM. The propagation motion analysis is illustrated

in Figure 3-1. The neighboring RHPM regions (rep-

resented by the other square) are regions with a high

probability of propagation (RHPP). It is expected to

measure a consistent motion between a region and its

neighborhood. Eight RHPP are generated around the

RHPM. All these regions are at a distance ∆ from

the RHPM epicenter. The bigger distance between

two epicenter, the less coherence the overlapping area

may exhibit. λ is the size of the area under investiga-

tion around the epicenter. Finally, β characterize the

number of direct propagation from the epicenter that

is carried out by the propagation analysis.

Each RHPP is analyzed in order to evaluate the

local coherency of the initial RHPM as illustrated in

Figure 3-2. As an outcome of the process, each lo-

cally consistent RHPP is characterized by a direc-

tional vector Ψ containing 36 bins (10° wide) of dif-

ferent magnitudes. Here the magnitudes correspond

to the number of occurrences of a given orientation at

different movement intensity scales (M

to M

). The

RHPM is considered to be consistent with its RHPP

if a conﬁdence rating ω exceeds a ﬁxed percentage

threshold θ. ω is computed as follow :

ω = 1 −

∑

k=1

min(Ψ

,Ψ

). (2)

where ω correspond to the intersection between two

neighboring region directional vector Ψ

and Ψ

and

k = 1,2,...,36 is the index of the bin. Next, recur-

sively, for each inter-coherent RHPP we conduct the

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

238

Figure 3: Estimate the motion propagation in the direct neighborhood of the speciﬁc RHPM (the central square).

same inter-region coherency measurements as long as

at least one nearly created RHPP is inter-region co-

herent with neighbor the previous one. The recursive

process ends when the value β is reached.

The motion propagation after one iteration is

given in Figure 3-3. RHPP are represented with bold

borders if the motion is coherent with the RHPM.

Otherwise, RHPP are represented with light borders.

When the motion between two neighbors region is

considered as coherent, a binary coherency map is

updated in order to keep track of the evaluation pro-

cedure and avoid cycles. However, local region that

are marked as non inter-region coherent, may be re-

evaluated as coherent with an other RHPP in subse-

quent propagation. This is especially true in presence

of skin wrinkles or furrows because motion disconti-

nuities appears.

Finally, each distribution vector (Ψ) correspond-

ing to the RHPPs that have direct or indirect connec-

tions to the original RHPM (e.g. at least once motion

is consistent between 2 neighbor regions) characterize

the global region motion. If the motion propagation

between all neighbors is inconsistent, the propagation

motion is no more explored and that means that there

are no more pertinent motions into the region. The

global region motion is extracted by applying the fol-

lowing formula

η =

∑

i=1

. (3)

Where n is the number of consistent regions (the

RHPM and all consistent RHPR). η is a histogram

over 36 bins, which contains, for each bin the sum of

each intensity of coherent RHPP. The maximum value

for each bin correspond to the number of consistent

regions n multiplied by the high intensity of motion in

Ψ, that is 4. Therefore, at this stage of the process, we

are able to calculate the coherent propagation motion

deﬁned by an oriented histogram η from a speciﬁc lo-

cation.

In the next, we study the impact of RHPM loca-

tion on the face. More speciﬁcally, we show that the

intensity of expression (full or micro) plays a key role

in the positioning of RHPM and, in the same time, it

impacts the way to extract the consistent motion on

the face.

4 EXPRESSION RECOGNITION

We use the facial landmarks to deﬁne a facial region

that increase facial deformation robustness during ex-

pression. Similarly to Jiang et al. (Jiang et al., 2014),

the facial landmarks are used to deﬁne a mesh over

the whole face, and a feature vector can be extracted

from each of the regions enclosed by the mesh. To

extract these facial meshes from face images, the fa-

cial landmarks are located with the method proposed

by Kazemi et al. (Kazemi and Sullivan, 2014). Next,

landmark positions and the geometrical statistics of

the face are used to compute a new set of points that

allow to deﬁne a mesh over the whole face (fore-

head, cheek). Finally, the best discriminant landmarks

points are selected from original landmarks corre-

sponding to the active face regions and speciﬁc points

are computed in order to set out the mesh boundaries.

The partitioning of facial regions of interest (ROIs) is

illustrated in the Figure 4. The partitioning of these

ROIs is based on the facial motion observed in the

previous consistency maps extracted from both full

and micro-expressions. The locations of these ROIs

are uniquely determined by the landmarks points. For

example, the position of the feature point f

is the av-

erage of positions of two feature points, f10 and f55.

The distance between the eyebrows and the forehead

feature points ( f

, f

,..., f

) correspond to the size of

the nose Distance

f 27, f 33

/4 which makes it possible to

maintain the same distance for optimal adaptation to

the size of the face.

Consistent Optical Flow Maps for Full and Micro Facial Expression Recognition

239

Figure 4: The partitioning of facial regions of interest.

The facial motion mask is computed from these

25 ROIs. The method used to build the feature vec-

tor from the facial motion mask is illustrated in the

Figure 5. In each frame f

, we consider the ﬁltered

Optical Flow inside each ROI R

, where i is the index

of frames and k = 1,2,...,25 is the index of ROIs. In-

side each R

, a histogram (η) is computed as deﬁned

in equation 3 from the Optical Flow ﬁltered consider-

ing the ROI as initial RHPM. Overtime, for each ROI,

the histograms are summed as deﬁned in equation 4,

which correspond to local facial motion of the entire

sequence of facial motion.

ζ(R

) =

∑

i=1

). (4)

Finally, all histograms ζ are concatenated into one-

row vector, which is considered as the feature vector

for the full and micro expression ζ = (ζ

,ζ

,...,ζ

An example is illustrated in the Figure 5, where all

histograms corresponding to the R

and R

with i ∈

[1,n] are summed as deﬁned in equation 4 in ζ

and

respectively then added to ζ.

Figure 5: Method for building the feature vector from the

facial motion mask.

The features vector size is equal to the number of

ROI multiplied by the number of bins, making a total

of 900 features values.

5 EVALUATION

In this section, we evaluate the performance of our

proposed method on two datasets : (the extended

Cohn-Kanade database (Lucey et al., 2010) and

CASME2 (Yan et al., 2014). We discuss the choice

of optimal parameters for the databases and show

that only the magnitude intervals must be adapted

to accommodate the speciﬁcities of intensity of fa-

cial expression. Finally, we compare our performance

against major state-of-the-art approaches.

5.1 Full-expression

CK+ contains 410 facial expression sequences from

100 participants coming from different ethnicities and

genders. In these image sequences, the expression

starts from a neutral status and ends in the apex status.

The number of samples for the following expressions,

i.e. anger, sadness, happiness, surprise, fear and dis-

gust are 42, 82, 100, 80, 64 and 45, respectively.

In the experiments, we use LIBSVM (Chang and

Lin, 2011) with the Radial Basis Function kernel and

the 10 fold cross-validation protocol. This protocol is

used by several approaches working on CK+ as it ﬁts

better to the size and the structure of the data set. Each

expression is classiﬁed into one of the six classes :

anger, fear, disgust, happiness, sadness, and surprise.

The following experimental results are obtained

using λ = 15, β = 3, ∆ = 10. Initially, we consid-

ered the following magnitude intervals in every re-

gion: M1(x)|x ∈ [1,10], M2(x )|x ∈ [2, 10], M3(x)|x ∈

[3,10], M4(x)|x ∈ [4,10], M5(x)|x ∈ [5,10]. Each in-

terval stops at a maximum of 10, where 10 corre-

sponds to the mean of the max of coherent magnitude

estimated from all sequences. The overlap of inter-

vals allow to ensure consistency in each histogram.

Small movements around the mouth corners and be-

tween the eyes were not always detected and we in-

cluded the magnitude M

and delete the magnitude

to retain only 5 intervals of magnitudes for the

corresponding regions R

Table 1 compares the performance of the proposed

method with the recent state-of-the-art Optical Flow

methods on CK+. The performance of the our system

is comparable with the other systems as it achieved

an average recognition rate of 93.17% with alignment

based on eyes and coherent Optical Flow. Neverthe-

less, the highest recognition rate is obtained using fea-

tures from the ﬁltered coherent facial motion com-

bined with geometric features.

Our method reported comparable recognition per-

formance with the most competitive Optical Flow ap-

proaches. Although we report the best accuracy re-

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

240

Table 1: Performance Comparison of Different State-of-

the-Art Optical Flow Approaches on CK+ Database. The

bold means our proposed methods.

Method Measure Seq. Exp. Acc(%)

(Liao et al., 2013) LOSO 442 6 92,5%

(Su et al., 2007) train/test 415 5 93,27%

(Lee and Chellappa, 2014) 4-fold ndef 7 86,7%

Coherent Flow + RHPM 10-fold 410 6 93.17%

Coherent Flow + RHPM + Geom. 10-fold 410 6 95.34%

sults for 6-class expressions, Su et al. (Su et al., 2007)

achieve higher scores, but they only use 5 classes

(Neutral, Happy, Surprise, Sad, Anger). Considering

the variations observed in the number of sequences

and expression types recognized by the various meth-

ods it is difﬁcult to clearly identify the best one. For

our experiments we used the original CK+ collection

as introduced in (Lucey et al., 2010) and we brought

no modiﬁcation neither to the videos nor the annota-

tions.

We have shown that our approach obtains good

performances in CK+, where the expressions are

acted and there is little or no disruptions (head mo-

tion, lightning variation). Next, we present the perfor-

mance of our method to recognize micro-expressions.

5.2 Micro-expression

The CASME2 database contains 246 spontaneous

micro-expressions from 26 subjects, categorized into

ﬁve classes: happiness (32 samples), disgust (63 sam-

ples), surprise (25 samples), repression (27 samples)

and Others (99 samples).

In the experiments, we use leave-one-subject-out

(LOSO) cross validation protocol because it is more

strict than leave-one-out (LOO) cross validation and

matches better the structure of the data (different

number of videos par subject). In this protocol, the

samples from one subject are used for testing, the rest

for training. We use the LIBSVM (Chang and Lin,

2011) with the Radial Basis Function and the grid

search method, where the optimal parameter is pro-

vided using ten-fold cross validation.

Table 2 shows a comparison to some other ap-

proaches for micro-expression using the recognition

rates given in each paper. It should be noted that

the results are not directly comparable due to differ-

ent experimental setups (number of expression classes

and number of sequences), but they still give an

indication of the discriminating power of each ap-

proach. Our method outperforms the other meth-

ods in almost all cases. The best results are ob-

tained using the same mask and parameters as for full-

expression recognition (λ = 15, β = 3, ∆ = 10) except

for the division of magnitudes deﬁned here as follows

: M1(x)|x ∈ [0.1,5], M2(x)|x ∈ [0.2,5], M3(x)|x ∈

[0.3,5], M4(x)|x ∈ [0.4,5], M5(x)|x ∈ [0.5,5]. The

geometric information was not considered here, as

the landmarks locations are mostly stables through-

out the sequence. It should be noted that the Optical

Flow is not calculated from two consecutive frame but

on two frame intervals. Indeed, the time lapses be-

tween two frame in CASME2 is so small (recorded

with high-speed camera (at 200 fps)) and combined

with the low expression intensity it is difﬁcult not

make a distinction between the noise and the true fa-

cial motion. No magnitude consistency can be found

in local region with our method when consecutive

frames are processed. Hence, we are considering the

entire sequence, but this is frequent in the literature

as other authors summarize videos in fewer frames

(Wang et al., 2014b; Huang et al., 2016a; Huang et al.,

2016b).

Table 2: Performance comparison with the state-of-the-art

methods on CASME2 database. Results in bold correspond

to our method.

Method Measure Class Acc(%)

Baseline (Yan et al., 2014) LOO 5 63.41%

LBP-SIP (Wang et al., 2014b) LOO 5 67.21%

LSDF (Wang et al., 2014a) LOO 5 65.44%

MDMO (Liu et al., 2015) LOSO 4 67.37%

STLBP-IIP (Huang et al., 2016a) LOSO 5 62.75%

DiSTLBP-IPP (Huang et al., 2016a) LOSO 5 64.78%

Coherent Flow + RHPM LOSO 5 65.35%

If the recognition process is re-evaluated on a four

classes basis (Happy, Disgust, Surprise, Repression),

the performance is improved by 11.57%, which cor-

responds to an accuracy of 76.92%. This proves that

the Other class does not stand out clearly from oth-

ers. In (Liu et al., 2015), the repression and the other

sequences are combined in a single class, which re-

duces the chances of falsely classiﬁcation of Happi-

ness to Repression class. This new organization re-

ported a gain of 1.02% with our method. Moreover,

(Liu et al., 2015) reported on removing 11 samples

in the recognition process due to mis-estimates of the

facial features in the ﬁrst frame of the video.

The results obtained on the original CASME2 and

the reorganized variants show the good performances

for micro-expressions recognition. Our method out-

performs the other state-of-the-art methods in al-

most all cases. These results were obtained by em-

ploying the same method used for recognizing full-

expressions, except for, smaller magnitude intervals

that were considered in order to ﬁt better to low mag-

nitudes in micro-expressions.

Consistent Optical Flow Maps for Full and Micro Facial Expression Recognition

241

6 CONCLUSIONS

In the paper, we have shown that the coherent move-

ment extracted from dense Optical Flow method by

considering the facial movement hypothesis achieves

state-of-the-art performance on both facial full-

expression and micro-expression databases. The

magnitude and direction constraints are estimated in

order to reduce the noise induced by lighting changes

and small head motions over time. The proposed ap-

proach adapts well on both full-expressions (CK+)

and micro-expressions (CASME2). The only adjust-

ment concerning the magnitude intervals is actually

related to the nature of expression. The other param-

eters common to both experiences have been selected

empirically and deserve speciﬁc attention in future

experiments.

Our current approach is used only in near-frontal-

view recordings where the presence of occlusions,

fast head motion and lightning variation is practi-

cally zero. The next step consist in adapting our

method to the domain of spontaneous facial expres-

sion recognition. To address this situation, a normal-

ization method will be necessarily used. However, it

must be kept in mind that any change made in the

facial picture has important side-effects on the Op-

tical Flow. Despite the wealth of research already

conducted, no method is capable of dealing with all

issues at a time. We believe that the normalization

approaches based on facial components or shape are

not adapted to Optical Flow as facial deformation will

impact Optical Flow computation by inducing motion

distortion. So rather than considering the normaliza-

tion in the ﬁeld of facial components, efforts should

instead be focused on the Optical Flow domain.

ACKNOWLEDGEMENTS

This research has been partially supported by the FUI

project MAGNUM 2.

REFERENCES

Bailer, C., Taetz, B., and Stricker, D. (2015). Flow ﬁelds:

Dense correspondence ﬁelds for highly accurate large

displacement optical ﬂow estimation. In ICCV.

Chang, C.-C. and Lin, C.-J. (2011). Libsvm: a library for

support vector machines. ACM TIST.

Chen, Q. and Koltun, V. (2016). Full ﬂow: Optical ﬂow

estimation by global optimization over regular grids.

CVPR.

Farneb

ack, G. (2003). Two-frame motion estimation based

on polynomial expansion. In SCIA. Springer.

Fortun, D., Bouthemy, P., and Kervrann, C. (2015). Optical

ﬂow modeling and computation: a survey. Computer

Vision and Image Understanding.

Han, S., Meng, Z., Liu, P., and Tong, Y. (2014). Facial grid

transformation: A novel face registration approach for

improving facial action unit recognition. In ICIP.

Huang, X., Wang, S., Liu, X., Zhao, G., Feng, X.,

and Pietikainen, M. (2016a). Spontaneous fa-

cial micro-expression recognition using discrimina-

tive spatiotemporal local binary pattern with an im-

proved integral projection. CVPR.

Huang, X., Zhao, G., Hong, X., Zheng, W., and Pietik

ainen,

M. (2016b). Spontaneous facial micro-expression

analysis using spatiotemporal completed local quan-

tized patterns. Neurocomputing.

Jiang, B., Martinez, B., Valstar, M. F., and Pantic, M.

(2014). Decision level fusion of domain speciﬁc re-

gions for facial action recognition. In ICPR.

Kazemi, V. and Sullivan, J. (2014). One millisecond face

alignment with an ensemble of regression trees. In

CVPR.

Lee, C.-S. and Chellappa, R. (2014). Sparse localized fa-

cial motion dictionary learning for facial expression

recognition. In ICASSP.

Li, X., Pﬁster, T., Huang, X., Zhao, G., and Pietik

ainen,

M. (2013). A spontaneous micro-expression database:

Inducement, collection and baseline. In FG.

Liao, C.-T., Chuang, H.-J., Duan, C.-H., and Lai, S.-H.

(2013). Learning spatial weighting for facial expres-

sion analysis via constrained quadratic programming.

Pattern Recognition.

Liu, Y.-J., Zhang, J.-K., Yan, W.-J., Wang, S.-J., Zhao, G.,

and Fu, X. (2015). A main directional mean optical

ﬂow feature for spontaneous micro-expression recog-

nition. Affective Computing.

Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z.,

and Matthews, I. (2010). The extended cohn-kanade

dataset (ck+): A complete dataset for action unit and

emotion-speciﬁed expression. In CVPR Workshops.

eteri, R. and Chetverikov, D. (2005). Dynamic texture

recognition using normal ﬂow and texture regularity.

In IbPRIA.

Revaud, J., Weinzaepfel, P., Harchaoui, Z., and Schmid, C.

(2015). Epicﬂow: Edge-preserving interpolation of

correspondences for optical ﬂow. In CVPR.

Su, M.-C., Hsieh, Y., and Huang, D.-Y. (2007). A simple

approach to facial expression recognition. In WSEAS.

Wang, S.-J., Yan, W.-J., Zhao, G., Fu, X., and Zhou, C.-G.

(2014a). Micro-expression recognition using robust

principal component analysis and local spatiotempo-

ral directional features. In ECCV Workshop.

Wang, Y., See, J., Phan, R. C.-W., and Oh, Y.-H. (2014b).

Lbp with six intersection points: Reducing redundant

information in lbp-top for micro-expression recogni-

tion. In ACCV.

Yan, W.-J., Li, X., Wang, S.-J., Zhao, G., Liu, Y.-J., Chen,

Y.-H., and Fu, X. (2014). Casme ii: An improved

spontaneous micro-expression database and the base-

line evaluation. PloS one.

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

242