DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion

Huang, Yukun; Wang, Jianan; Zeng, Ailing; Zha, Zheng Jun; Zhang, Lei; Liu, Xihui

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TPAMI.2025.3586284
Scopus: eid_2-s2.0-105010218956
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Electrical & Electronic Engineering: Journal/Magazine Articles
- HKU Musketeers Foundation Institute of Data Science: Journal/Magazine Articles

Article: DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion

Title	DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion
Authors	Huang, Yukun Wang, Jianan Zeng, Ailing Zha, Zheng Jun Zhang, Lei Liu, Xihui
Keywords	3D avatar generation 3D Gaussians 3D human diffusion model expressive animation score distillation
Issue Date	1-Jan-2025
Publisher	Institute of Electrical and Electronics Engineers
Citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025 How to Cite? DOI: http://dx.doi.org/10.1109/TPAMI.2025.3586284
Abstract	Leveraging pretrained 2D diffusion models and score distillation sampling (SDS), recent methods have shown promising results for text-to-3D avatar generation. However, generating high-quality 3D avatars capable of expressive animation remains challenging. In this work, we present DreamWaltz-G, a novel learning framework for animatable 3D avatar generation from text. The core of this framework lies in Skeleton-guided Score Distillation and Hybrid 3D Gaussian Avatar representation. Specifically, the proposed skeleton-guided score distillation integrates skeleton controls from 3D human templates into 2D diffusion models, enhancing the consistency of SDS supervision in terms of view and human pose. This facilitates the generation of high-quality avatars, mitigating issues such as multiple faces, extra limbs, and blurring. The proposed hybrid 3D Gaussian avatar representation builds on the efficient 3D Gaussians, combining neural implicit fields and parameterized 3D meshes to enable real-time rendering, stable SDS optimization, and expressive animation. Extensive experiments demonstrate that DreamWaltz-G is highly effective in generating and animating 3D avatars, outperforming existing methods in both visual quality and animation expressiveness. Our framework further supports diverse applications, including human video reenactment and multi-subject scene composition.
Persistent Identifier	http://hdl.handle.net/10722/359178
ISSN	0162-8828 2023 Impact Factor: 20.8 2023 SCImago Journal Rankings: 6.158

DC Field	Value	Language
dc.contributor.author	Huang, Yukun	-
dc.contributor.author	Wang, Jianan	-
dc.contributor.author	Zeng, Ailing	-
dc.contributor.author	Zha, Zheng Jun	-
dc.contributor.author	Zhang, Lei	-
dc.contributor.author	Liu, Xihui	-
dc.date.accessioned	2025-08-23T00:30:27Z	-
dc.date.available	2025-08-23T00:30:27Z	-
dc.date.issued	2025-01-01	-
dc.identifier.citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025	-
dc.identifier.issn	0162-8828	-
dc.identifier.uri	http://hdl.handle.net/10722/359178	-
dc.description.abstract	<p>Leveraging pretrained 2D diffusion models and score distillation sampling (SDS), recent methods have shown promising results for text-to-3D avatar generation. However, generating high-quality 3D avatars capable of expressive animation remains challenging. In this work, we present DreamWaltz-G, a novel learning framework for animatable 3D avatar generation from text. The core of this framework lies in Skeleton-guided Score Distillation and Hybrid 3D Gaussian Avatar representation. Specifically, the proposed skeleton-guided score distillation integrates skeleton controls from 3D human templates into 2D diffusion models, enhancing the consistency of SDS supervision in terms of view and human pose. This facilitates the generation of high-quality avatars, mitigating issues such as multiple faces, extra limbs, and blurring. The proposed hybrid 3D Gaussian avatar representation builds on the efficient 3D Gaussians, combining neural implicit fields and parameterized 3D meshes to enable real-time rendering, stable SDS optimization, and expressive animation. Extensive experiments demonstrate that DreamWaltz-G is highly effective in generating and animating 3D avatars, outperforming existing methods in both visual quality and animation expressiveness. Our framework further supports diverse applications, including human video reenactment and multi-subject scene composition.</p>	-
dc.language	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers	-
dc.relation.ispartof	IEEE Transactions on Pattern Analysis and Machine Intelligence	-
dc.subject	3D avatar generation	-
dc.subject	3D Gaussians	-
dc.subject	3D human	-
dc.subject	diffusion model	-
dc.subject	expressive animation	-
dc.subject	score distillation	-
dc.title	DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion	-
dc.type	Article	-
dc.identifier.doi	10.1109/TPAMI.2025.3586284	-
dc.identifier.scopus	eid_2-s2.0-105010218956	-
dc.identifier.eissn	1939-3539	-
dc.identifier.issnl	0162-8828	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats