File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1609/aaai.v33i01.33019299
- Find via
Supplementary
-
Citations:
- Appears in Collections:
Article: Talking face generation by adversarially disentangled audio-visual representation
Title | Talking face generation by adversarially disentangled audio-visual representation |
---|---|
Authors | |
Issue Date | 2019 |
Publisher | AAAI Press. The Journal's web site is located at https://aaai.org/Library/AAAI/aaai-library.php |
Citation | Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, Hawaii, USA, 27 January – 1 February 2019, v. 33 n. 1, p. 9299-9306 How to Cite? |
Abstract | Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of the talking face regions. Existing works either construct specific face appearance model on specific subjects or model the transformation between lip motion and speech. In this work, we integrate both aspects and enable arbitrary-subject talking face generation by learning disentangled audio-visual representation. We find that the talking face sequence is actually a composition of both subject-related information and speech-related information. These two spaces are then explicitly disentangled through a novel associative-and-adversarial training process. This disentangled representation has an advantage where both audio and video can serve as inputs for generation. Extensive experiments show that the proposed approach generates realistic talking face sequences on arbitrary subjects with much clearer lip motion patterns than previous work. We also demonstrate the learned audio-visual representation is extremely useful for the tasks of automatic lip reading and audio-video retrieval. |
Description | AAAI Technical Track: Vision |
Persistent Identifier | http://hdl.handle.net/10722/284260 |
ISSN |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhou, H | - |
dc.contributor.author | Liu, Y | - |
dc.contributor.author | Liu, Z | - |
dc.contributor.author | Luo, P | - |
dc.contributor.author | Wang, X | - |
dc.date.accessioned | 2020-07-20T05:57:19Z | - |
dc.date.available | 2020-07-20T05:57:19Z | - |
dc.date.issued | 2019 | - |
dc.identifier.citation | Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, Hawaii, USA, 27 January – 1 February 2019, v. 33 n. 1, p. 9299-9306 | - |
dc.identifier.issn | 2159-5399 | - |
dc.identifier.uri | http://hdl.handle.net/10722/284260 | - |
dc.description | AAAI Technical Track: Vision | - |
dc.description.abstract | Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of the talking face regions. Existing works either construct specific face appearance model on specific subjects or model the transformation between lip motion and speech. In this work, we integrate both aspects and enable arbitrary-subject talking face generation by learning disentangled audio-visual representation. We find that the talking face sequence is actually a composition of both subject-related information and speech-related information. These two spaces are then explicitly disentangled through a novel associative-and-adversarial training process. This disentangled representation has an advantage where both audio and video can serve as inputs for generation. Extensive experiments show that the proposed approach generates realistic talking face sequences on arbitrary subjects with much clearer lip motion patterns than previous work. We also demonstrate the learned audio-visual representation is extremely useful for the tasks of automatic lip reading and audio-video retrieval. | - |
dc.language | eng | - |
dc.publisher | AAAI Press. The Journal's web site is located at https://aaai.org/Library/AAAI/aaai-library.php | - |
dc.relation.ispartof | Proceedings of the AAAI Conference on Artificial Intelligence | - |
dc.rights | Copyright (c) 2019 Association for the Advancement of Artificial Intelligence | - |
dc.title | Talking face generation by adversarially disentangled audio-visual representation | - |
dc.type | Article | - |
dc.identifier.email | Luo, P: pluo@hku.hk | - |
dc.identifier.authority | Luo, P=rp02575 | - |
dc.description.nature | link_to_OA_fulltext | - |
dc.identifier.doi | 10.1609/aaai.v33i01.33019299 | - |
dc.identifier.hkuros | 311003 | - |
dc.identifier.volume | 33 | - |
dc.identifier.issue | 1 | - |
dc.identifier.spage | 9299 | - |
dc.identifier.epage | 9306 | - |
dc.publisher.place | United States | - |
dc.identifier.issnl | 2159-5399 | - |