File Download

There are no files associated with this item.

Supplementary

Conference Paper: GAS: Generative Avatar Synthesis from a Single Image

TitleGAS: Generative Avatar Synthesis from a Single Image
Authors
Issue Date19-Oct-2025
Abstract

We present a unified and generalizable framework for synthesizing view-consistent and temporally coherent avatars from a single image, addressing the challenging task of single-image avatar generation. Existing diffusion-based methods often condition on sparse human templates (e.g., depth or normal maps), which leads to multi-view and temporal inconsistencies due to the mismatch between these signals and the true appearance of the subject. Our approach bridges this gap by combining the reconstruction power of regression-based 3D human reconstruction with the generative capabilities of a diffusion model. In a first step, an initial 3D reconstructed human through a generalized NeRF provides comprehensive conditioning, ensuring high-quality synthesis faithful to the reference appearance and structure. Subsequently, the derived geometry and appearance from the generalized NeRF serve as input to a video-based diffusion model. This strategic integration is pivotal for enforcing both multi-view and temporal consistency throughout the avatar's generation. Empirical results underscore the superior generalization ability of our proposed method, demonstrating its effectiveness across diverse in-domain and out-of-domain in-the-wild datasets.


Persistent Identifierhttp://hdl.handle.net/10722/359014

 

DC FieldValueLanguage
dc.contributor.authorLu, Yixing-
dc.contributor.authorDong, Junting-
dc.contributor.authorKwon, Youngjoong-
dc.contributor.authorZhao, Qin-
dc.contributor.authorDai, Bo-
dc.contributor.authorDe la Torre, Fernando-
dc.date.accessioned2025-08-19T00:32:07Z-
dc.date.available2025-08-19T00:32:07Z-
dc.date.issued2025-10-19-
dc.identifier.urihttp://hdl.handle.net/10722/359014-
dc.description.abstract<p>We present a unified and generalizable framework for synthesizing view-consistent and temporally coherent avatars from a single image, addressing the challenging task of single-image avatar generation. Existing diffusion-based methods often condition on sparse human templates (e.g., depth or normal maps), which leads to multi-view and temporal inconsistencies due to the mismatch between these signals and the true appearance of the subject. Our approach bridges this gap by combining the reconstruction power of regression-based 3D human reconstruction with the generative capabilities of a diffusion model. In a first step, an initial 3D reconstructed human through a generalized NeRF provides comprehensive conditioning, ensuring high-quality synthesis faithful to the reference appearance and structure. Subsequently, the derived geometry and appearance from the generalized NeRF serve as input to a video-based diffusion model. This strategic integration is pivotal for enforcing both multi-view and temporal consistency throughout the avatar's generation. Empirical results underscore the superior generalization ability of our proposed method, demonstrating its effectiveness across diverse in-domain and out-of-domain in-the-wild datasets.</p>-
dc.languageeng-
dc.relation.ispartofInternational Conference on Computer Vision (ICCV) (19/10/2025-23/10/2025, Honolulu, Hawai'i)-
dc.titleGAS: Generative Avatar Synthesis from a Single Image-
dc.typeConference_Paper-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats