EAPC: Emotion and Audio Prior Control Framework for the Emotional and Temporal Talking Face Generation

Xuan-Nam Cao; Xuan-Nam Cao; Quoc-Huy Trinh; Quoc-Huy Trinh; Quoc-Anh Do-Nguyen; Quoc-Anh Do-Nguyen; Van-Son Ho; Van-Son Ho; Hoai-Thuong Dang; Hoai-Thuong Dang; Minh-Triet Tran; Minh-Triet Tran

doi:10.5220/0012455700003636

EAPC: Emotion and Audio Prior Control Framework for the Emotional and Temporal Talking Face Generation

Xuan-Nam Cao, Xuan-Nam Cao, Quoc-Huy Trinh, Quoc-Huy Trinh, Quoc-Anh Do-Nguyen, Quoc-Anh Do-Nguyen, Van-Son Ho, Van-Son Ho, Hoai-Thuong Dang, Hoai-Thuong Dang, Minh-Triet Tran, Minh-Triet Tran

2024

Abstract

Generating realistic talking faces from audio input is a challenging task with broad applications in fields such as film production, gaming, and virtual reality. Previous approaches, employing a two-stage process of converting audio to landmarks and then landmarks to a face, have shown promise in creating vivid videos. However, they still face challenges in maintaining consistency due to misconnections between information from the previous audio frame and the current audio frame, leading to the generation of unnatural landmarks. To address this issue, we propose EAPC, a framework that incorporates features from previous audio frames with the current audio feature and the current facial landmark. Additionally, we introduce the Dual-LSTM module to enhance emotion control. By doing so, our framework improves the temporal aspects and emotional information of the audio input, allowing our model to capture speech dynamics and produce more coherent animations. Extensive experiments demonstrate that our method can generate consistent landmarks, resulting in more realistic and synchronized faces, leading to the achievement of our competitive results with state-of-the-art methods. The implementation of our method will be made publicly available upon publication.

Download

Paper Citation

in Harvard Style

Cao X., Trinh Q., Do-Nguyen Q., Ho V., Dang H. and Tran M. (2024). EAPC: Emotion and Audio Prior Control Framework for the Emotional and Temporal Talking Face Generation. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-680-4, SciTePress, pages 520-530. DOI: 10.5220/0012455700003636

in Bibtex Style

@conference{icaart24,
author={Xuan-Nam Cao and Quoc-Huy Trinh and Quoc-Anh Do-Nguyen and Van-Son Ho and Hoai-Thuong Dang and Minh-Triet Tran},
title={EAPC: Emotion and Audio Prior Control Framework for the Emotional and Temporal Talking Face Generation},
booktitle={Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2024},
pages={520-530},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012455700003636},
isbn={978-989-758-680-4},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - EAPC: Emotion and Audio Prior Control Framework for the Emotional and Temporal Talking Face Generation
SN - 978-989-758-680-4
AU - Cao X.
AU - Trinh Q.
AU - Do-Nguyen Q.
AU - Ho V.
AU - Dang H.
AU - Tran M.
PY - 2024
SP - 520
EP - 530
DO - 10.5220/0012455700003636
PB - SciTePress