REFERENCES
C. Szegedy, S. Ioffe, and V. Vanhoucke, ``Inception-v4,
inception-ResNet and the impact of residual
connections on learning,'' in Proc. 31st AAAI Conf.
Artif. Intell., 2016, p. 1.
C. Du and S. Gao, ``Image segmentation-based multi-focus
image fusion through multi-scale convolutional neural
network,'' IEEE Access, vol. 5, pp. 1575015761, 2017.
D. Amodei et al., ``Deep speech 2: End-to-end speech
recognition in english and mandarin,'' in Proc. Int. Conf.
Mach. Learn., Jun. 2016, pp. 173182.
D. Hazarika, S. Gorantla, S. Poria, and R. Zimmermann,
``Self-attentive feature-level fusion for multimodal
emotion detection,'' in Proc. IEEE Conf. Multimedia
Inf. Process. Retr. (MIPR), Apr. 2018, pp. 196201.
D. Kollias, P. Tzirakis, M. A. Nicolaou, A. Papaioannou,
G. Zhao, B. Schuller, I. Kotsia, and S. Zafeiriou, ``Deep
affect prediction in the-wild: Aff-wild database and
challenge, deep architectures, and beyond,'' Int. J.
Comput. Vis., vol. 127, pp. 123, Jun. 2019.
D. Kollias, A. Schulc, E. Hajiyev, and S. Zafeiriou,
``Analysing affective behavior in the rst ABAW 2020
competition,'' 2020, arXiv:2001.11409. [Online].
Available: http://arxiv.org/abs/2001.11409
H.-C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J.
Yao, D. Mollura, and R. M. Summers, ``Deep
convolutional neural networks for computer-aided
detection: CNN architectures, dataset characteristics
and transfer learning,'' IEEE Trans. Med. Imag., vol. 35,
no. 5, pp. 12851298, May 2016.
K.-Y. Huang, C.-H. Wu, Q.-B. Hong, M.-H. Su, and Y.-H.
Chen, ``Speech emotion recognition using deep neural
network considering verbal and nonverbal speech
sounds,'' in Proc. IEEE Int. Conf. Acoust., Speech
Signal Process. (ICASSP), May 2019, pp. 58665870.
M. E. Kret, K. Roelofs, J. J. Stekelenburg, and B. de Gelder,
``Emotional signals from faces, bodies and scenes
inuence observers' face expressions, Fixations and
pupil-size,'' Frontiers Human Neurosci., vol. 7, p. 810,
Dec. 2013.
M. Z. Uddin, M. M. Hassan, A. Almogren, A. Alamri, M.
Alrubaian, and G. Fortino, ``Facial expression
recognition utilizing local direction-based robust
features and deep belief network,'' IEEE Access, vol. 5,
pp. 45254536, 2017.
M. Z. Uddin, W. Khaksar, and J. Torresen, ``Facial
expression recognition using salient features and
convolutional neural network,'' IEEE Access, vol. 5, pp.
2614626161, 2017.
M. R. Koujan, A. Akram, P. McCool, J. Westerfeld, D.
Wilson, K. Dhaliwal, S. McLaughlin, and A.
Perperidis, ``Multi-class classification of pulmonary
endomicroscopic images,'' in Proc. IEEE 15th Int.
Symp. Biomed. Imag. (ISBI), Apr. 2018, pp. 15741577.
M.-I. Georgescu, R. T. Ionescu, and M. Popescu, ``Local
learning with deep and handcrafted features for facial
expression recognition,'' IEEE Access, vol. 7, pp.
6482764836, 2019.
O. Leonovych, M. R. Koujan, A. Akram, J. Westerfeld, D.
Wilson, K. Dhaliwal, S. McLaughlin, and A.
Perperidis, ``Texture descriptors for classifying sparse,
irregularly sampled optical endomicroscopy images,'' in
Proc. Annu. Conf. Med. Image Understand. Anal.
Cham, Switzerland: Springer, 2018, pp. 165176.
P. Ekman and W. V. Friesen, ``Constants across cultures in
the face and emotion.,'' J. Personality Social Psychol.,
vol. 17, no. 2, pp. 124-129, 1971.
S. Lawrence, C. L. Giles, A. Chung Tsoi, and A. D. Back,
``Face recognition: A convolutional neural-network
approach,'' IEEE Trans. Neural Netw., vol. 8, no. 1, pp.
98113, Jan. 1997.
T. Chang, G. Wen, Y. Hu, and J. Ma, ``Facial expression
recognition based on complexity perception
classication algorithm,'' 2018, arXiv:1803.00185.
[Online]. Available: http://arxiv.org/abs/1803.00185
W. Y. Choi, K. Y. Song, and C.W. Lee, ``Convolutional
attention networks for multimodal emotion recognition
from speech and text data,'' in Proc. Grand Challenge
Workshop Hum. Multimodal Lang. (Challenge-HML),
2018, pp. 2834.