Effects of Region Features on the Accuracy of Cross-database

Facial Expression Recognition

Yining Yang, Branislav Vuksanovic and Hongjie Ma

School of Energy and Electronic Engineering University of Portsmouth Portsmouth, U.K.

Keywords: Facial Expression Recognition, Cross-database, Facial Region Feature, Local Binary Patterns, Feature

Extractions.

Abstract: Facial expression recognition (FER) in the context of machine learning refers to a solution whereby a

computer vision system can be trained and used to automatically detect the emotion of a person from a

presented facial image. FER presents a difficult image classification problem that has received increasing

attention over recent years mainly due to the availability of powerful hardware for system implementation

and the greater number of possible applications in everyday life. However, the FER problem has not yet been

fully resolved, with the diversity of captured facial images from which the type of expression or emotion is

to be detected being one of the main obstacles. Ready-made image databases have been compiled by

researchers to train and test the developed FER algorithms. Most of the reported algorithms perform relatively

well when trained and tested on a single-database but offer significantly inferior results when trained on one

database and then tested using facial images from an entirely different database. This paper deals with the

cross-database FER problem by proposing a novel approach which aggregates local region features from the

eyes, nose and mouth and selects the optimal classification techniques for this specific aggregation. The

conducted experiments show a substantial improvement in the recognition results when compared to similar

cross-database tests reported in other works. This paper confirms the idea that, for images originating from

different databases, focus should be given to specific regions while less attention is paid to the face in general

and other facial sections.

1 INTRODUCTION

Facial expressions and gestures play important roles

in human communication as they can express

information directly. Facial expression is one of the

most powerful and natural ways for human beings to

relay their emotions and intentions (Ying-Li, Kanada,

& Cohn, 2001), (Li & Deng, 2018). Although facial

expression is a universal language that can widely and

directly present emotions (Perikos, Paraskevas, &

Hatzilygeroudis, 2018), facial expression recognition

(FER) poses a challenge in the field of computer

vision. One expectation is that a machine will be able

to understand and interpret human emotions; however,

no machine is capable of utilising emotions for

communication. In real life, 55% of human emotions

are shown through facial expressions (Chen et al.,

2019)(Xie & Hu, 2019) thus FER is important for

human–robot interaction. Enabling machines to

recognise human behaviour is an area that has been

progressively developed in recent years. In previous

decades, human–robot interaction has been studied in

social and behavioural sciences whereas the

techniques for machine-based FER have also become

a popular and intensely researched topic in recent

years. Thus, finding related applications in

psychology, behavioural science and human computer

interfacing, such as interpersonal relation prediction

by FER, is important (Zhang, Luo, Loy, & Tang,

2018).

Facial expression categorisation can be geometric-

based, appearance-based and either local or global

(Kumari, Rajesh, & Pooja, 2015). Normally, seven

prototypical or basic facial expressions can be

detected and identified: neutral, happiness, sadness,

fear, disgust, surprise and anger (Ekman & Friesen,

1971), (Ekman, 1994). Emotions are not often

presented by prototypic expressions. Instead,

emotions are communicated by certain local changes

of the face (Ying-Li et al., 2001). Emotions are

generated by the muscles of facial action units (AUs)

(Liu Yanpeng et al., 2016). The combination of

610

Yang, Y., Vuksanovic, B. and Ma, H.

Effects of Region Features on the Accuracy of Cross-database Facial Expression Recognition.

DOI: 10.5220/0008966306100617

In Proceedings of the 12th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2020) - Volume 2, pages 610-617

ISBN: 978-989-758-395-7; ISSN: 2184-433X

different AUs can then generate different facial

expressions (Liu Yanpeng et al., 2016). However, the

capability to influence models based on prototypic

emotions to represent the complexity and subtlety of

daily emotions is limited (Martinez & Valstar, 2016).

The face can be divided into upper and lower face

AUs (Ying-Li et al., 2001). The upper face model

trained by the Ekman–Hager (EH) database yielded

accuracies of 93.2% and 96.4% when tested using the

EH and Cohn–Kanade (CK) databases (Kanade,

Cohn, & Tian, 2000), respectively. The lower face

model trained by the CK database features an accuracy

of 93.4% when tested using the EH database and an

accuracy of 96.7% concerning the CK database. The

CK database contains 81% Caucasian, 13% African

and 6% other groups whereas the EH database

includes 24 Caucasian subjects. Thus, this model

performs well in Caucasian FER.

Chen proposed a Softmax regression-based deep

sparse autoencoder network to recognise facial

emotions(Chen et al., 2018). The first steps involve

the extraction of a region of interest (ROI) for facial

expression image features. The ROI in (Chen et al.,

2018) included the eyebrows, eyes and mouth.

Extracting these ROI areas can not only reduce the

interference in facial information caused by image

interference in noncritical parts but also reduce the

amount of data and thus improve the computing speed.

Their experiments used Japanese female facial

expression (JAFFE) (Li & Deng, 2018) and extended

CK (CK+) databases(Lucey et al., 2010). The average

accuracy results of JAFFE and CK+ databases were

89.12% and 89.03%, respectively. In this experiment.

the database was divided into three groups. Therefore,

the test dataset differed from the training dataset.

However, the results from JAFFE or CK+ only

support single databases.

Y. Fan et al studied FER via local regions, including

the left eye, nose and mouth, and deep learning

algorithms.(Fan, Lam, & Li, 2018). They proposed

three points for FER. The first point is a novel multi-

region ensemble convolutional neural network (CNN)

framework that aims to improve CNN models by using

multiple facial regions which include global features

and local regions. The second point is that the weighted

prediction scores from each sub-network are

aggregated to produce a final high-accuracy prediction.

The third point is to research the effect of different

regions from face images on FER. The network of

CNNs could show both low-level profile features and

high-level specific features. In the sub-network, each

local region (left eye, nose and mouth) and whole face

will be inputted into a double-input sub-network. With

three particulate regions, three prediction scores are

obtained. The ensemble prediction stage will achieve a

final prediction rate based on the weighted sum

operation (Fan et al., 2018).

Another paper (Xie & Hu, 2019) presented a

method for inputting local facial features and the

whole face separately. However, this network features

a CNN structure containing two branches and no sub-

network. The method, named deep comprehensive

multipatch aggregation CNN, consists of two

hierarchical features: local and holistic. The local

features are extracted from image patches and depict

details of expression. The holistic features are

extracted from the whole image and provide high-

level semantic information. Both features are

aggregated before classification. The common method

of FER uses only a single feature type, but this method

uses two feature types to interpret expressional

information. However, in the training step, a novel

pooling method that can handle nuisance variations,

that is, expressional transformation invariants, is

proposed. The evaluation of the method uses CK+ and

JAFFE databases and a cross-database evaluation is

adopted to test the method (Xie & Hu, 2019).

Liu and Chen proposed the combined CNN–

centralized binary pattern (CBP) which consists of

CBP and CNN features(Liu & Chen, 2017). The

features were then classified using support vector

machine (SVM). With the CNN–CBP features, the

average recognition accuracies of CK+ and JAFFE

databases reached 97.6% and 88.7%, respectively.

However, with the same model, the accuracy totalled

34.6% when training with the CK+ database and

testing with the JAFFE database.

Other authors (Zavarez, Berriel, & Oliveira-Santos,

2017) have proposed a visual geometry group (VGG)–

face deep convolutional network model. When testing

with a VGG–fine-tuned model, which was trained with

CK+, JAFFE, MMI, RaFD, KDEF, BU3DFE and

ARFace databases, the test accuracies of CK+ and

JAFFE totalled 88.58% and 44.32%, respectively. In

(da Silva & Pedrini, 2015), the model, which also has

the poor performance in cross-database, consisted of a

histogram of oriented gradient filter and a SVM

classifier. The experiment tested four databases,

including CK+, JAFFE, MUG and BOSPHORUS. The

model accuracy was 42.3% when the model was trained

by the CK+ database and tested by JAFFE database. If

the training database was JAFFE and the testing

database was CK+, the accuracy was 48.2%. The author

believed that different cultures could confuse the

classifier for recognition(da Silva & Pedrini, 2015). On

the basis of local binary patterns (LBPs), (Shan, Gong,

& McOwan, 2009) formulates boosted LBP to extract

LBP features. The SVM was used for LBP feature

Effects of Region Features on the Accuracy of Cross-database Facial Expression Recognition

611

classification. It has a similar result to the above

research in a cross-database. When the model was

trained by CK+ database, the generalisation

performance boosted the LBP-based SVM on different

datasets and the accuracy result in approximately 40%

on the JAFFE database and 50% on MMI.

Thus, according to reviewed literature, FER

accuracy is reasonably high when the training and

testing datasets originate from the same, i.e. single,

database. However, for cross-database FER

experiments, where training and testing datasets

originate from different databases, the recognition

accuracy severely deteriorates. This paper proposes

the use of local areas extracted from facial images to

improve the accuracy of the cross-database FER. The

rest of the paper is organised in the following way.

Section 2 describes the proposed method where details

of the face extraction approach are described and the

reasons behind the SVM classifier choice in

This approach is justified. Section 3 describes the

conducted experiments and discusses the results using

this approach. Achieved recognition rates indicate

significant improvement compared to similar cross-

database FER tests and recognition rates reported in

the literature, also reviewed earlier in this section.

Section 4 summarises the work and provides certain

conclusions.

2 PROPOSED FER APPROACH

To test the cross-database FER performance, a

traditional framework-based FER system with LBP

and SVM was used in this work. This section provides

information about the main stages of the employed

FER system illustrated in Fig 1.

2.1 Databases

According to published papers, various ready-made

datasets are available to test and assess the

performance of different expression recognition

algorithms. The proposed algorithm was tested on

JAFFE and CK+ databases. Table 1 provides the

details for each database. The CK+ database contains

facial images from Western (Caucasian) populations.

The JAFFE database, on the other hand, consists of the

facial images of Asian, specifically Japanese, females.

In addition to certain structural differences, cultural

differences also exist between the two databases.

Thus, their recognition accuracies are relatively poor.

Figure 1: Main stages of the employed FER system (HA:

happiness, SA: sadness, AN: anger, SU: surprise, FE: fear,

DI: disgust and NE: neutral).

Table 1: Details of two databases used in this paper.

Database

Facial

expression

Number o

Subjects

Number of

images

Number of

images used

in the

experiment

JAFFE

neutral,

happiness,

sadness,

fear, angry,

disgust and

surprise

213

static

images

213

static images

CK+ 123

327

sequences

700

static images

The JAFFE database (Li & Deng, 2018) contains

3–5 images in each of the seven expressions from each

subject.

The CK+ database(Ekman, 1994) (Lucey et al.,

2010) consists of 593 expression sequences from 123

subjects, where 327 sequences are labelled with one of

the seven expressions (angry, disgust, fear, happy, sad,

surprise and contempt). The 123 subjects came from

different regions with varying races, ages and genders.

Each image sequence contained a set of captured

frames when the subject changed from a neutral

emotional state and finishes at the peak expression. The

neutral frame and four peak frames of each sequences

were selected from the 327 labelled sequences. Based

on the balance from the eight expressions, each

expression will include 100 images. Compared with the

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

612

seven facial expressions in the JAFFE database, the

same facial expressions (except for contempt) were

used for the CK+ database. Hence, the CK+ database

included 700 images in total.

2.2 Face Detection

From the database image, the face or feature region

should be detected before training or testing the FER

model. Face detection could reduce the effect of

property information. The face for each input image in

the proposed approach is detected by Dlib. The Dlib

is an open-source C+ library implementing a variety

of machine learning algorithms, including

classification, regression, data transformation and

structured recognition. The Dlib can be used as a tool

for high-quality face recognition (Davis King, 2003).

Figure 2: Sixty-eight (68) point face landmarks from Dlib.

The pre-trained facial landmark detector inside the

Dlib was used to estimate the location of 68 (x, y)

coordinates that were mapped to facial structures. The

indexes of the 68 coordinates were visualised in the

image scheme above (Fig 2.). The whole face and

local region of the face could be detected using the

indicated landmarks.(Boyko, Basystiuk, &

Shakhovska, 2018).

2.3 Feature Extraction Via LBP

In accordance with the detected image region, features

will be extracted by LBP. The original LBP operator

was introduced by Ojala et al.(Ojala, Pietikäinen, &

Harwood, 1996) and was proven to be a powerful

means of texture description. LBP has since been

successfully applied to a wide range of other image

recognition tasks, such as FER (Sun, Li, Zhou, & He,

2016) (Levi & Hassner, 2015). The operator labels the

pixels of an image by thresholding a 3 x 3

neighbourhood of each pixel with the centre value.

The result of thresholding can be considered a binary

number (see Fig. 3 for an illustration) and the 256-bin

histogram of the LBP labels computed over a region

was used as a texture descriptor.

Figure 3: Basic LBP operator.

The LBP operator generates a binary number. The

binary number compares the neighbouring pixel

values with the centre pixel value. The pattern with

eight neighbourhoods is expressed by





,



2

















(1)





is the pixel value at coordinates in the

neighbourhood of , and 



is the pixel value at

coordinate ,. The operator , produces



different outputs, corresponding to 2



different

binary patterns formed by  pixels in the

neighborhood.

  

1,  0

0,  0

(2)

The histogram of LBP labels calculated over a

region that can be exploited as a feature descriptor is

given by













,







,

(3)

The limitation of the basic LBP operator is its

small 3 x 3 neighbourhood which cannot capture

dominant features with large-scale structures (Shan et

al., 2009). Hence, the operator was later extended to

use neighbourhoods of different sizes (Ojala,

Pietikäinen, & Mäenpää, 2002).

After labelling an image with the LBP operator, a

histogram of the labelled image contains information

about the distribution of the local micro-patterns over

the whole image. Thus, the histogram can be used to

statistically describe image characteristics. Face

images can be observed as a composition of micro-

patterns that can be effectively described by the LBP

histograms. Therefore, LBP features were intuitively

used to represent face images (Shan et al., 2009). The

LBP histogram computed over the whole face image

encoded only the occurrences of the micro-patterns

without any indication about their locations.

591

446

723

110

100

Threshold

Binary: 11010011

Decimal: 211

Effects of Region Features on the Accuracy of Cross-database Facial Expression Recognition

613

2.4 Classification Via SVM

A SVM classifier was selected as it is well founded in

statistical learning theory and has been successfully

applied in various tasks in computer vision (Zhao,

2007). SVM is a technique previously used successfully

in facial expression classification. As a powerful

machine learning technique for data classification,

SVM performs an implicit mapping of data into a

higher (maybe infinite) dimensional feature space and

then finds a linear separating hyperplane with the

maximal margin to separate data in this higher

dimensional space (Kanade et al., 2000) (Shan et al.,

2009). The quadratic SVM was used for classification

in this paper. The multiclass method of the SVM is a

one-against-one method. When the model is trained, the

SVM will use fivefold cross-validation.

3 EXPERIMENTS AND RESULTS

The performance of the FER system was evaluated

using three different sections of facial images from two

different databases, where images from the first

database were used only for system training and the

images from the second database were only used for

testing.

3.1 Local Image Detection

Sections of the facial images included a) the original

facial image from the database with background, b)

narrowed, i.e. extracted facial region image without

background and c) local regions (eyes, nose and

mouth) used in the experiments (Fig. 4). The eye

region includes eyebrows and eyes. The nose region

includes nose and sides of the nose. The mouth region

includes mouth, jaw and the two sides of the mouth.

The local facial regions further increase the proportion

of effective information.

a) b) c)

Figure 3: Facial regions used in the experiment.

3.2 Classification Selection

Different classifiers have been tested and their

performance compared using features from the

original and narrowed facial images in both the JAFFE

and CK+ databases in order to select the most

appropriate one. The results show that the quadratic

SVM offers somewhat superior performance when

compared to other available classification algorithms.

Those include cubic SVM as well as a k-Nearest

Neighbor classifier, although both of those resulted in

acceptable recognition rates. Based on the comparison

of the three classifiers (Table 2), the quadratic SVM

classifier was finally selected for the following set of

experiments.

Table 2: Performance of single-database FER for different

classifications.

Classification Accuracy

Type Name

JAFFE CK+

Wide

Image

Narrow

Image

Wide

Image

Narrow

Image

1 SVM

Quadratic

SVM

80.3% 81.2% 98.6% 99.6%

2 SVM Cubic SVM 79.8% 81.2% 98.6% 99.6%

3 KNN Fine KNN 80.3% 81.6% 97.6% 99.1%

As explained in the previous section, LBP was

used to extract relevant facial features which were

then classified using the quadratic SVM technique.

3.3 Cross-database FER Via Original

and Narrowed Facial Images

Using original and narrowed facial images, the cross-

database test was carried out and the following results

were obtained:

Table 3: Performance of single- and cross-database FER

arrangements for different facial ranges.

Databases

Single-Database FER Cross-Database FER

CK+ JAFFE

CK+

Trained

(JAFFE

Tested)

JAFFE

Trained

(CK+

Tested)

Whole Face

with

Background

99.00% 80.75% 18.78% 15.14%

Narrowed Face

(no

Background)

98.43% 84.98% 33.33% 32.86%

In the case of original facial images, the classifier

performed well with single-database but the recognition

rate reached no more than 20% with cross-databases. In

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

614

the case of narrowed facial images, the accuracy of a

single-database slightly changed, whereas that of a

cross-database notably improved. However, both

accuracy values were higher than 30%. This is in

agreement with (Liu & Chen, 2017) where similarly

low recognition rates of 34.6% were reported for cross-

database FER recognition. These findings also show

that the recognition rate with cross-databases can be

improved by reducing background information. This

conclusion led to the use of local facial features for

FER.

3.4 Cross-database FER Via Local

Region Images

To enhance the recognition accuracy by increasing the

proportion of useful information, three local facial

regions, eyes, nose and mouth, have been extracted

and tested for FER.

Figure 4: Accuracy variation for cross-database

arrangements compared with single-database system with

single features (1: JAFFE single database, 2: JAFFE-trained

and CK+-tested, 3: CK+ single database, 4: CK+-trained and

JAFFE tested).

After conducting the local facial image test, the

accuracy of the mouth region increased by 6.11%

when the model was trained by a CK+ database and

the expression identified by a JAFFE database. This

finding means that the mouth region contains the

information most relevant for facial expression

recognition. However, the accuracy of models trained

using the JAFFE database with cross-databases

validation decreased relative to the results obtained

with narrowed facial images.

The newly obtained features which comprised

pairwise aggregation of local facial features were used

for testing. For the model trained with the JAFFE

database, when JAFFE and CK+ databases were used

for testing, the combination of the eye and mouth

regions was more accurate than the single-feature

result and slightly better than the finding obtained

using narrowed facial images, respectively. These

results suggest that the mouth and eyes contain useful

information.

Figure 5: Accuracy variation for cross-database

arrangements compared with single-database system with

fusion features (1: JAFFE single database, 2: JAFFE-trained

and CK+-tested, 3: CK+ single database, 4: CK+-trained

and JAFFE-tested).

For the model trained by the CK+ database, the

accuracy of a single database showed no significant

improvement. However, when tested by the JAFFE

database, the accuracy also significantly improved with

the combination features of the eye and mouth regions.

Compared to the results achieved using narrowed facial

images, the accuracy improved by 12.68%. When

compared to the result achieved using only the mouth

region, the accuracy improved by 6.57%.

Finally, the results of the conducted experiments

can also be compared to the cross-database recognition

rates reported in other papers. When using narrowed

facial images or single face regions, recognition rates

are very similar to those reported elsewhere. However,

the recognition rates achieved in this work, when tests

are performed using features aggregated by the three

local facial regions, are significantly higher compared

to other reported results reported. Table 4 provides an

overview and comparison of the recognition rates. It is

also worth pointing out that, compared to the paper by

Zavarez (2017), the proposed method requires less

training which reduces the challenges arising from the

size of the training set and the computational power

required for model training. This indicates that the FER

accuracy is improved when using the new feature

because it increases the amount of effective information

that can support the FER when using cross-databases.

Effects of Region Features on the Accuracy of Cross-database Facial Expression Recognition

615

Table 4: Comparison with previous approaches.

Training Databases

(Test: JAFFE)

Accuracy

Liu and Chen (Liu &

Chen, 2017)

CK+ 34.6%

Shan, Gong, and

McOwan (Shan et al.,

2009)

CK+ 40%

Silva and Pedrini (da

Silva & Pedrini, 2015)

CK+ 42.3%

Zavarez, Berriel, and

Oliveira-Santos (Zavarez

et al., 2017)

CK+, JAFFE, MMI,

RaFD, KDEF, BU3DFE

and ARFace

44.32%

Proposed CK+ 50.23%

Finally, three local features were aggregated to

obtain a new feature with the intention of further

reducing the proportion of less relevant information

contained in the training data set. The new feature was

tested by cross-databases. The results of the JAFFE

database-trained model tested in a single-database

showed a slight improvement over the test results for

narrowed facial images. Meanwhile, the accuracy rate

of the cross-databases test results increased by 5.71%.

The most significant improvement in the recognition

accuracy was the recognition results of the model

trained by the CK+ database. The accuracy of this

cross-databases test reached 50.23% which is a 16.9%

improvement compared to the result achieved using

narrowed facial images.

This new feature is more sensitive to the

expression of “disgust” as indicated by the confusion

matrix (Table 5). Although the recognition rate for

“disgust” is low, the new feature can accurately

recognise this expression. Secondly, this novel feature

is relatively sensitive to expressions of both

“happiness” and “surprise”.

Table 5: Confusion matrices for FER system tests using a

fusion feature with three local regions (units: %).

Prediction Label

AN DI FE HA NE SA SU Acc

Real Label

AN 50.00

00.00 00.00 03.33 33.33 10.00 03.33

50.00

24.14

17.24

24.14 00.00 24.14 06.90 34.5

17.24

03.13 00.00

28.13

06.25 43.75 00.00 18.75

28.13

03.23 00.00 03.23

80.65

09.68 00.00 03.23

80.65

03.33 00.00 06.67 00.00

66.67

06.67 16.67

66.67

25.81 00.00 03.23 03.23 19.35

38.71

09.68

38.71

13.33 00.00 03.33 03.33 10.00 00.00

70.00 70.00

Average Accuracy 50.23

4 CONCLUSIONS

A large number of different FER systems reported in

the literature perform well when training and test

samples both originate from the same, precompiled

database of facial images. However, the accuracy of

results drops drastically when the same system is

tested using images from an entirely different database

not used in the training phase. This paper investigated

the influence of various sections of a facial image on

the level of deterioration in cross-database FER

system performance. It was found that the drop in

system performance is less severe if the background is

removed from the image. Comparing the image

having been removed the background with important

sections of the facial image, the performance of the

recognition rate has different improvements when

using important sections of the facial image.

Encouraging results have been recorded when the

mouth region was used in the experiment, a region

which was proven to hold a significant and crucial

amount of information related to facial expression and

emotion of the person in the image. When the new

feature aggregated the features from the eyes, nose

and mouth, the proportion of effective information

further increased. The experiments showed substantial

improvement in the recognition results. The

recognition accuracy of 50.23% represents a

significant improvement when compared to cross-

database FER results reported elsewhere in the

research literature. It is also worth noting that the

result is achieved using a “classical” approach, i.e.

without employing deep learning techniques, thus

requiring significantly less computing power. Future

work will now focus on testing the performance of

deep learning algorithms using only the most

important sections of facial images in similar cross-

database arrangements.

REFERENCES

Boyko, N., Basystiuk, O., & Shakhovska, N. (2018).

Performance Evaluation and Comparison of Software

for Face Recognition, Based on Dlib and Opencv

Library. Proceedings of the 2018 IEEE 2nd

International Conference on Data Stream Mining and

Processing, DSMP 2018, 478–482. https://doi.org/10.

1109/DSMP.2018.8478556

Chen, L., Li, M., Su, W., Wu, M., Hirota, K., & Pedrycz, W.

(2019). Adaptive Feature Selection-Based AdaBoost-

KNN With Direct Optimization for Dynamic Emotion

Recognition in Human–Robot Interaction.pdf.

Chen, L., Zhou, M., Su, W., Wu, M., She, J., & Hirota, K.

(2018). Softmax regression based deep sparse

autoencoder network for facial emotion recognition in

human-robot interaction. Information Sciences, 428,

49–61. https://doi.org/10.1016/j.ins.2017.10.044

da Silva, F. A. M., & Pedrini, H. (2015). Effects of cultural

characteristics on building an emotion classifier through

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

616

facial expression analysis. Journal of Electronic

Imaging, 24(2), 023015. https://doi.org/10.1117/1.jei.

24.2.023015

Davis King. (2003). D-lib C++ library. Retrieved June 16,

2019, from http://dlib.net/

Ekman, P. (1994). Strong evidence for universals in facial

expressions a reply to Russell’s mistaken critique.

Psychological Bulletin, 115(2), 268–287.

https://doi.org/10.1037/0033-2909.115.2.268

Ekman, P., & Friesen, W. V. (1971). Constants across

cultures in the face and emotion. Journal of Personality

and Social Psychology, 17(2), 124–129.

https://doi.org/10.1037/h0030377

Fan, Y., Lam, J. C. K., & Li, V. O. K. (2018). Multi-region

Ensemble Convolutional Neural Network for Facial

Expression Recognition.

Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive

Database for Facial Expression Analysis. Proceedings

of the Fourth IEEE International Conference on

Automatic Face and Gesture Recognition (Cat. No.

PR00580), 46–53. https://doi.org/10.1109/AFGR.2000.

840611

Kumari, J., Rajesh, R., & Pooja, K. M. (2015). Facial

Expression Recognition: A Survey. Procedia Computer

Science, 58, 486–491. https://doi.org/10.1016/j.procs.

2015.08.011

Levi, G., & Hassner, T. (2015). Emotion Recognition in the

Wild via Convolutional Neural Networks and Mapped

Binary Patterns. ICMI ’15 Proceedings of the 2015 ACM

on International Conference on Multimodal Interaction,

503–510. https://doi.org/10.1145/2818346.2830587

Li, S., & Deng, W. (2018). Deep Facial Expression

Recognition: A Survey. 1–25. Retrieved from

http://arxiv.org/abs/1804.08348

Liu, Y., & Chen, Y. (2017). Recognition of facial expression

based on CNN-CBP features. Proceedings of the 29th

Chinese Control and Decision Conference, CCDC

2017, 2139–2145. https://doi.org/10.1109/CCDC.2017.

7978869

Liu Yanpeng, Cao Yuwen, Li Yibin, Liu Ming, Song Rui,

Wang Yafang, … Ma Xin. (2016). Facial Expression

Recognition with PCA and LBP Features Extracting

from Active Facial Patches. Proceedings of The 2016

IEEE International Conference on Real-Time

Computing and Robotics, 368–373. https://doi.org/10.

1109/RCAR.2016.7784056

Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z.,

& Matthews, I. (2010). The extended Cohn-Kanade

dataset (CK+): A complete dataset for action unit and

emotion-specified expression. 2010 IEEE Computer

Society Conference on Computer Vision and Pattern

Recognition - Workshops, 94–101. https://doi.org/10.

1109/CVPRW.2010.5543262

Martinez, B., & Valstar, M. F. (2016). Advances,

challenges, and opportunities in automatic facial

expression recognition. Advances in Face Detection and

Facial Image Analysis, 63–100. https://doi.org/10.

1007/978-3-319-25958-1_4

Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A

comparative study of texture measures with

classification based on feature distributions. Pattern

Recognition, 29(1), 51–59. https://doi.org/10.1016/

0031-3203(95)00067-4

Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002).

Multiresolution gray-scale and rotation invariant texture

classification with local binary patterns. IEEE

Transactions on Pattern Analysis and Machine

Intelligence, 24(7), 971–987. https://doi.org/10.1109/

TPAMI.2002.1017623

Perikos, I., Paraskevas, M., & Hatzilygeroudis, I. (2018).

Facial Expression Recognition Using Adaptive Neuro-

fuzzy Inference Systems. 2018 IEEE/ACIS 17th

International Conference on Computer and Information

Science (ICIS), 1–6. https://doi.org/10.1109/ICIS.2018.

8466438

Shan, C., Gong, S., & McOwan, P. W. (2009). Facial

expression recognition based on Local Binary Patterns:

A comprehensive study. Image and Vision Computing,

27(6), 803–816. https://doi.org/10.1016/j.imavis.2008.

08.005

Sun, B., Li, L., Zhou, G., & He, J. (2016). Facial expression

recognition in the wild based on multimodal texture

features. Journal of Electronic Imaging, 25(6), 061407.

https://doi.org/10.1117/1.jei.25.6.061407

Xie, S., & Hu, H. (2019). Facial Expression Recognition

Using Hierarchical Features with Deep Comprehensive

Multipatches Aggregation Convolutional Neural

Networks. IEEE Transactions on Multimedia, 21(1),

211–220. https://doi.org/10.1109/TMM.2018.2844085

Ying-Li, T., Kanada, T., & Cohn, J. F. (2001). Recognizing

upper face action units for facial expression analysis.

Proceedings IEEE Conference on Computer Vision and

Pattern Recognition. CVPR 2000 (Cat. No.PR00662),

1(2), 294–301. https://doi.org/10.1109/CVPR.2000.

855832

Zavarez, M. V., Berriel, R. F., & Oliveira-Santos, T. (2017).

Cross-Database Facial Expression Recognition Based

on Fine-Tuned Deep Convolutional Network.

Proceedings - 30th Conference on Graphics, Patterns

and Images, SIBGRAPI 2017, 405–412. https://doi.org/

10.1109/SIBGRAPI.2017.60

Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2018). From

Facial Expression Recognition to Interpersonal Relation

Prediction. International Journal of Computer Vision,

126(5), 550–569. https://doi.org/10.1007/s11263-017-

1055-1

Zhao, G. (2007). Dynamic Texture Recognition Using Local

Binary Patterns with an Application to Facial

Expressions. Pattern Analysis and Machine

Intelligence, 29(6), 915–928. https://doi.org/10.1109/

TPAMI.2007.1110

Effects of Region Features on the Accuracy of Cross-database Facial Expression Recognition

617