Talking face generation by adversarially disentangled audio-visual representation

Zhou, H; Liu, Y; Liu, Z; Luo, P; Wang, X

File Download

re01.html

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1609/aaai.v33i01.33019299
Find via

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Article: Talking face generation by adversarially disentangled audio-visual representation

Title	Talking face generation by adversarially disentangled audio-visual representation
Authors	Zhou, H Liu, Y Liu, Z Luo, P Wang, X
Issue Date	2019
Publisher	AAAI Press. The Journal's web site is located at https://aaai.org/Library/AAAI/aaai-library.php
Citation	Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, Hawaii, USA, 27 January – 1 February 2019, v. 33 n. 1, p. 9299-9306 How to Cite? DOI: http://dx.doi.org/10.1609/aaai.v33i01.33019299
Abstract	Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of the talking face regions. Existing works either construct specific face appearance model on specific subjects or model the transformation between lip motion and speech. In this work, we integrate both aspects and enable arbitrary-subject talking face generation by learning disentangled audio-visual representation. We find that the talking face sequence is actually a composition of both subject-related information and speech-related information. These two spaces are then explicitly disentangled through a novel associative-and-adversarial training process. This disentangled representation has an advantage where both audio and video can serve as inputs for generation. Extensive experiments show that the proposed approach generates realistic talking face sequences on arbitrary subjects with much clearer lip motion patterns than previous work. We also demonstrate the learned audio-visual representation is extremely useful for the tasks of automatic lip reading and audio-video retrieval.
Description	AAAI Technical Track: Vision
Persistent Identifier	http://hdl.handle.net/10722/284260
ISSN	2159-5399

DC Field	Value	Language
dc.contributor.author	Zhou, H	-
dc.contributor.author	Liu, Y	-
dc.contributor.author	Liu, Z	-
dc.contributor.author	Luo, P	-
dc.contributor.author	Wang, X	-
dc.date.accessioned	2020-07-20T05:57:19Z	-
dc.date.available	2020-07-20T05:57:19Z	-
dc.date.issued	2019	-
dc.identifier.citation	Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, Hawaii, USA, 27 January – 1 February 2019, v. 33 n. 1, p. 9299-9306	-
dc.identifier.issn	2159-5399	-
dc.identifier.uri	http://hdl.handle.net/10722/284260	-
dc.description	AAAI Technical Track: Vision	-
dc.description.abstract	Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of the talking face regions. Existing works either construct specific face appearance model on specific subjects or model the transformation between lip motion and speech. In this work, we integrate both aspects and enable arbitrary-subject talking face generation by learning disentangled audio-visual representation. We find that the talking face sequence is actually a composition of both subject-related information and speech-related information. These two spaces are then explicitly disentangled through a novel associative-and-adversarial training process. This disentangled representation has an advantage where both audio and video can serve as inputs for generation. Extensive experiments show that the proposed approach generates realistic talking face sequences on arbitrary subjects with much clearer lip motion patterns than previous work. We also demonstrate the learned audio-visual representation is extremely useful for the tasks of automatic lip reading and audio-video retrieval.	-
dc.language	eng	-
dc.publisher	AAAI Press. The Journal's web site is located at https://aaai.org/Library/AAAI/aaai-library.php	-
dc.relation.ispartof	Proceedings of the AAAI Conference on Artificial Intelligence	-
dc.rights	Copyright (c) 2019 Association for the Advancement of Artificial Intelligence	-
dc.title	Talking face generation by adversarially disentangled audio-visual representation	-
dc.type	Article	-
dc.identifier.email	Luo, P: pluo@hku.hk	-
dc.identifier.authority	Luo, P=rp02575	-
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.doi	10.1609/aaai.v33i01.33019299	-
dc.identifier.hkuros	311003	-
dc.identifier.volume	33	-
dc.identifier.issue	1	-
dc.identifier.spage	9299	-
dc.identifier.epage	9306	-
dc.publisher.place	United States	-
dc.identifier.issnl	2159-5399	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Talking face generation by adversarially disentangled audio-visual representation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats