Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation

LIU, L; Xu, W; Habermann, M; Zollhoefer, M; Bernard, F; Kim, H; Wang, WP; Theobalt, C

File Download

re01.htm

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TVCG.2020.2996594
Scopus: eid_2-s2.0-85114303493
PMID: 32746256
WOS: WOS:000692890200013
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation

Title	Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation
Authors	LIU, L Xu, W Habermann, M Zollhoefer, M Bernard, F Kim, H Wang, WP Theobalt, C
Keywords	Video-based Characters Deep Learning Neural Rendering Learning Dynamic Texture Rendering-to-Video Translation
Issue Date	2020
Publisher	Institute of Electrical and Electronics Engineers. The Journal's web site is located at https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=2945
Citation	IEEE Transactions on Visualization and Computer Graphics, 2020, Epub 2020-05-26 How to Cite? DOI: http://dx.doi.org/10.1109/TVCG.2020.2996594
Abstract	Synthesizing realistic videos of humans using neural networks has been a popular alternative to the conventional graphics-based rendering pipeline due to its high efficiency. Existing works typically formulate this as an image-to-image translation problem in 2D screen space, which leads to artifacts such as over-smoothing, missing body parts, and temporal instability of fine-scale detail, such as pose-dependent wrinkles in the clothing. In this paper, we propose a novel human video synthesis method that approaches these limiting factors by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space. More specifically, our method relies on the combination of two convolutional neural networks (CNNs). Given the pose information, the first CNN predicts a dynamic texture map that contains time-coherent high-frequency details, and the second CNN conditions the generation of the final video on the temporally coherent output of the first CNN. We demonstrate several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively.
Persistent Identifier	http://hdl.handle.net/10722/293922
ISSN	1077-2626 2023 Impact Factor: 4.7 2023 SCImago Journal Rankings: 2.056
ISI Accession Number ID	WOS:000692890200013

DC Field	Value	Language
dc.contributor.author	LIU, L	-
dc.contributor.author	Xu, W	-
dc.contributor.author	Habermann, M	-
dc.contributor.author	Zollhoefer, M	-
dc.contributor.author	Bernard, F	-
dc.contributor.author	Kim, H	-
dc.contributor.author	Wang, WP	-
dc.contributor.author	Theobalt, C	-
dc.date.accessioned	2020-11-23T08:23:47Z	-
dc.date.available	2020-11-23T08:23:47Z	-
dc.date.issued	2020	-
dc.identifier.citation	IEEE Transactions on Visualization and Computer Graphics, 2020, Epub 2020-05-26	-
dc.identifier.issn	1077-2626	-
dc.identifier.uri	http://hdl.handle.net/10722/293922	-
dc.description.abstract	Synthesizing realistic videos of humans using neural networks has been a popular alternative to the conventional graphics-based rendering pipeline due to its high efficiency. Existing works typically formulate this as an image-to-image translation problem in 2D screen space, which leads to artifacts such as over-smoothing, missing body parts, and temporal instability of fine-scale detail, such as pose-dependent wrinkles in the clothing. In this paper, we propose a novel human video synthesis method that approaches these limiting factors by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space. More specifically, our method relies on the combination of two convolutional neural networks (CNNs). Given the pose information, the first CNN predicts a dynamic texture map that contains time-coherent high-frequency details, and the second CNN conditions the generation of the final video on the temporally coherent output of the first CNN. We demonstrate several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively.	-
dc.language	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers. The Journal's web site is located at https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=2945	-
dc.relation.ispartof	IEEE Transactions on Visualization and Computer Graphics	-
dc.rights	IEEE Transactions on Visualization and Computer Graphics. Copyright © Institute of Electrical and Electronics Engineers.	-
dc.rights	©20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	-
dc.subject	Video-based Characters	-
dc.subject	Deep Learning	-
dc.subject	Neural Rendering	-
dc.subject	Learning Dynamic Texture	-
dc.subject	Rendering-to-Video Translation	-
dc.title	Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation	-
dc.type	Article	-
dc.identifier.email	Wang, WP: wenping@cs.hku.hk	-
dc.identifier.authority	Wang, WP=rp00186	-
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.doi	10.1109/TVCG.2020.2996594	-
dc.identifier.pmid	32746256	-
dc.identifier.scopus	eid_2-s2.0-85114303493	-
dc.identifier.hkuros	318953	-
dc.identifier.volume	Epub 2020-05-26	-
dc.identifier.spage	1	-
dc.identifier.epage	1	-
dc.identifier.isi	WOS:000692890200013	-
dc.publisher.place	United States	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats