File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Appears in Collections:
Conference Paper: GAS: Generative Avatar Synthesis from a Single Image
| Title | GAS: Generative Avatar Synthesis from a Single Image |
|---|---|
| Authors | |
| Issue Date | 19-Oct-2025 |
| Abstract | We present a unified and generalizable framework for synthesizing view-consistent and temporally coherent avatars from a single image, addressing the challenging task of single-image avatar generation. Existing diffusion-based methods often condition on sparse human templates (e.g., depth or normal maps), which leads to multi-view and temporal inconsistencies due to the mismatch between these signals and the true appearance of the subject. Our approach bridges this gap by combining the reconstruction power of regression-based 3D human reconstruction with the generative capabilities of a diffusion model. In a first step, an initial 3D reconstructed human through a generalized NeRF provides comprehensive conditioning, ensuring high-quality synthesis faithful to the reference appearance and structure. Subsequently, the derived geometry and appearance from the generalized NeRF serve as input to a video-based diffusion model. This strategic integration is pivotal for enforcing both multi-view and temporal consistency throughout the avatar's generation. Empirical results underscore the superior generalization ability of our proposed method, demonstrating its effectiveness across diverse in-domain and out-of-domain in-the-wild datasets. |
| Persistent Identifier | http://hdl.handle.net/10722/359014 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Lu, Yixing | - |
| dc.contributor.author | Dong, Junting | - |
| dc.contributor.author | Kwon, Youngjoong | - |
| dc.contributor.author | Zhao, Qin | - |
| dc.contributor.author | Dai, Bo | - |
| dc.contributor.author | De la Torre, Fernando | - |
| dc.date.accessioned | 2025-08-19T00:32:07Z | - |
| dc.date.available | 2025-08-19T00:32:07Z | - |
| dc.date.issued | 2025-10-19 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/359014 | - |
| dc.description.abstract | <p>We present a unified and generalizable framework for synthesizing view-consistent and temporally coherent avatars from a single image, addressing the challenging task of single-image avatar generation. Existing diffusion-based methods often condition on sparse human templates (e.g., depth or normal maps), which leads to multi-view and temporal inconsistencies due to the mismatch between these signals and the true appearance of the subject. Our approach bridges this gap by combining the reconstruction power of regression-based 3D human reconstruction with the generative capabilities of a diffusion model. In a first step, an initial 3D reconstructed human through a generalized NeRF provides comprehensive conditioning, ensuring high-quality synthesis faithful to the reference appearance and structure. Subsequently, the derived geometry and appearance from the generalized NeRF serve as input to a video-based diffusion model. This strategic integration is pivotal for enforcing both multi-view and temporal consistency throughout the avatar's generation. Empirical results underscore the superior generalization ability of our proposed method, demonstrating its effectiveness across diverse in-domain and out-of-domain in-the-wild datasets.</p> | - |
| dc.language | eng | - |
| dc.relation.ispartof | International Conference on Computer Vision (ICCV) (19/10/2025-23/10/2025, Honolulu, Hawai'i) | - |
| dc.title | GAS: Generative Avatar Synthesis from a Single Image | - |
| dc.type | Conference_Paper | - |
