Please use this identifier to cite or link to this item:
https://repository.cihe.edu.hk/jspui/handle/cihe/4713
Title: | Photo-realistic talking face generation under latent space manipulation | Author(s): | Salahudeen, Ridwan Siu, Wan Chi Chan, Anthony Hing-Hung |
Issue Date: | 2024 | Publisher: | IEEE | Journal: | IEEE Transactions on Consumer Electronics | Abstract: | This paper focuses on generating photo-realistic talking face videos by leveraging on semantic facial attributes in a latent space and capturing the talking style from an old video of a speaker. We formulate a process to manipulate facial attributes in the latent space by identifying semantic facial directions. We develop a deep learning pipeline to learn the correlation between the audio and the corresponding video frames from a reference video of a speaker in an aligned latent space. This correlation is used to navigate a static face image into frames of a talking face video, which is moderated by three carefully constructed loss functions, for accurate lip synchronization and photo-realistic video reconstruction. By combining these techniques, we aim to generate high-quality talking face videos that are visually realistic and synchronized with the provided audio input. Our results were evaluated against some state-of-the-art techniques on talking face generation, and we have recorded significant improvements in the image quality of the generated talking face video. |
URI: | https://repository.cihe.edu.hk/jspui/handle/cihe/4713 | DOI: | 10.1109/TCE.2024.3516387 | CIHE Affiliated Publication: | Yes |
Appears in Collections: | CIS Publication |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
View Online | 89 B | HTML | View/Open |

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.