Emotion Transformer: Attention Model for Pose-Based Emotion Recognition

Pedro V. V. Paiva; Pedro V. V. Paiva; Josué Ramos; Marina Gavrilova; Marco A. G. Carvalho

doi:10.5220/0011791700003417

Emotion Transformer: Attention Model for Pose-Based Emotion Recognition

Pedro V. V. Paiva, Pedro V. V. Paiva, Josué Ramos, Marina Gavrilova, Marco A. G. Carvalho

2023

Abstract

Capturing humans’ emotional states from images in real-world scenarios is a key problem in affective computing, which has various real-life applications. Emotion recognition methods can enhance video games to increase engagement, help students to keep motivated during e-learning sections, or make interaction more natural in social robotics. Body movements, a crucial component of non-verbal communication, remain less explored in the domain of emotion recognition, while face expression-based methods are widely investigated. Transformer networks have been successfully applied across several domains, bringing significant breakthroughs. Transformers’ self-attention mechanism captures relationships between different features across different spatial locations, allowing contextual information extraction. In this work, we introduce Emotion Transformer, a self-attention architecture leveraging spatial configurations of body joints for Body Emotion Recognition. Our approach is based on the visual transformer linear projection function, allowing the conversion of 2D joint coordinates to a regular matrix representation. The matrix projection then feeds a regular transformer multi-head attention architecture. The developed method allows a more robust correlation between joint movements with time to recognize emotions using contextual information learning. We present an evaluation benchmark for acted emotional sequences extracted from movie scenes using the BoLD dataset. The proposed methodology outperforms several state-of-the-art architectures, proving the effectiveness of the method.

Download

Paper Citation

in Harvard Style

V. V. Paiva P., Ramos J., Gavrilova M. and A. G. Carvalho M. (2023). Emotion Transformer: Attention Model for Pose-Based Emotion Recognition. In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP; ISBN 978-989-758-634-7, SciTePress, pages 274-281. DOI: 10.5220/0011791700003417

in Bibtex Style

@conference{visapp23,
author={Pedro V. V. Paiva and Josué Ramos and Marina Gavrilova and Marco A. G. Carvalho},
title={Emotion Transformer: Attention Model for Pose-Based Emotion Recognition},
booktitle={Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP},
year={2023},
pages={274-281},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011791700003417},
isbn={978-989-758-634-7},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP
TI - Emotion Transformer: Attention Model for Pose-Based Emotion Recognition
SN - 978-989-758-634-7
AU - V. V. Paiva P.
AU - Ramos J.
AU - Gavrilova M.
AU - A. G. Carvalho M.
PY - 2023
SP - 274
EP - 281
DO - 10.5220/0011791700003417
PB - SciTePress