Authors:
Xuan-Nam Cao
1
;
2
;
Quoc-Huy Trinh
1
;
2
;
Quoc-Anh Do-Nguyen
1
;
2
;
Van-Son Ho
1
;
2
;
Hoai-Thuong Dang
1
;
2
and
Minh-Triet Tran
1
;
2
Affiliations:
1
Faculty of Information Technology, University of Science, Ho Chi Minh City, Vietnam
;
2
Vietnam National University, Ho Chi Minh City, Vietnam
Keyword(s):
Landmark Generation, Talking Head, Dual-LSTM, Acoustic Features.
Abstract:
Generating realistic talking faces from audio input is a challenging task with broad applications in fields such as film production, gaming, and virtual reality. Previous approaches, employing a two-stage process of converting audio to landmarks and then landmarks to a face, have shown promise in creating vivid videos. However, they still face challenges in maintaining consistency due to misconnections between information from the previous audio frame and the current audio frame, leading to the generation of unnatural landmarks. To address this issue, we propose EAPC, a framework that incorporates features from previous audio frames with the current audio feature and the current facial landmark. Additionally, we introduce the Dual-LSTM module to enhance emotion control. By doing so, our framework improves the temporal aspects and emotional information of the audio input, allowing our model to capture speech dynamics and produce more coherent animations. Extensive experiments demonstr
ate that our method can generate consistent landmarks, resulting in more realistic and synchronized faces, leading to the achievement of our competitive results with state-of-the-art methods. The implementation of our method will be made publicly available upon publication.
(More)