loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Xuan-Nam Cao 1 ; 2 ; Quoc-Huy Trinh 1 ; 2 ; Quoc-Anh Do-Nguyen 1 ; 2 ; Van-Son Ho 1 ; 2 ; Hoai-Thuong Dang 1 ; 2 and Minh-Triet Tran 1 ; 2

Affiliations: 1 Faculty of Information Technology, University of Science, Ho Chi Minh City, Vietnam ; 2 Vietnam National University, Ho Chi Minh City, Vietnam

Keyword(s): Landmark Generation, Talking Head, Dual-LSTM, Acoustic Features.

Abstract: Generating realistic talking faces from audio input is a challenging task with broad applications in fields such as film production, gaming, and virtual reality. Previous approaches, employing a two-stage process of converting audio to landmarks and then landmarks to a face, have shown promise in creating vivid videos. However, they still face challenges in maintaining consistency due to misconnections between information from the previous audio frame and the current audio frame, leading to the generation of unnatural landmarks. To address this issue, we propose EAPC, a framework that incorporates features from previous audio frames with the current audio feature and the current facial landmark. Additionally, we introduce the Dual-LSTM module to enhance emotion control. By doing so, our framework improves the temporal aspects and emotional information of the audio input, allowing our model to capture speech dynamics and produce more coherent animations. Extensive experiments demonstr ate that our method can generate consistent landmarks, resulting in more realistic and synchronized faces, leading to the achievement of our competitive results with state-of-the-art methods. The implementation of our method will be made publicly available upon publication. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.143.9.223

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Cao, X.; Trinh, Q.; Do-Nguyen, Q.; Ho, V.; Dang, H. and Tran, M. (2024). EAPC: Emotion and Audio Prior Control Framework for the Emotional and Temporal Talking Face Generation. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-680-4; ISSN 2184-433X, SciTePress, pages 520-530. DOI: 10.5220/0012455700003636

@conference{icaart24,
author={Xuan{-}Nam Cao. and Quoc{-}Huy Trinh. and Quoc{-}Anh Do{-}Nguyen. and Van{-}Son Ho. and Hoai{-}Thuong Dang. and Minh{-}Triet Tran.},
title={EAPC: Emotion and Audio Prior Control Framework for the Emotional and Temporal Talking Face Generation},
booktitle={Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2024},
pages={520-530},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012455700003636},
isbn={978-989-758-680-4},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - EAPC: Emotion and Audio Prior Control Framework for the Emotional and Temporal Talking Face Generation
SN - 978-989-758-680-4
IS - 2184-433X
AU - Cao, X.
AU - Trinh, Q.
AU - Do-Nguyen, Q.
AU - Ho, V.
AU - Dang, H.
AU - Tran, M.
PY - 2024
SP - 520
EP - 530
DO - 10.5220/0012455700003636
PB - SciTePress