Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models

Xu, Jiale; Wang, Xintao; Cheng, Weihao; Cao, Yan Pei; Shan, Ying; Qie, Xiaohu; Gao, Shenghua

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/CVPR52729.2023.02003
Scopus: eid_2-s2.0-85173974761
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models

Title	Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
Authors	Xu, Jiale Wang, Xintao Cheng, Weihao Cao, Yan Pei Shan, Ying Qie, Xiaohu Gao, Shenghua
Keywords	and reasoning language Vision
Issue Date	2023
Citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023, v. 2023-June, p. 20908-20918 How to Cite? DOI: http://dx.doi.org/10.1109/CVPR52729.2023.02003
Abstract	Recent CLIP-guided 3D optimization methods, such as DreamFields [19] and PureCLIPNeRF [24], have achieved impressive results in zero-shot text-to-3D synthesis. However, due to scratch training and random initialization without prior knowledge, these methods often fail to generate accurate and faithful 3D structures that conform to the input text. In this paper, we make the first attempt to introduce explicit 3D shape priors into the CLIP-guided 3D optimization process. Specifically, we first generate a high-quality 3D shape from the input text in the text-to-shape stage as a 3D shape prior. We then use it as the initialization of a neural radiance field and optimize it with the full prompt. To address the challenging text-to-shape generation task, we present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model. To narrow the style domain gap between the images synthesized by the text-to-image diffusion model and shape renderings used to train the image-to-shape generator, we further propose to jointly optimize a learnable text prompt and fine-tune the text-to-image diffusion model for rendering-style image generation. Our method, Dream3D, is capable of generating imaginative 3D content with superior visual quality and shape accuracy compared to state-of-the-art methods. Our project page is at https://bluestyle97.github.io/dream3d/.
Persistent Identifier	http://hdl.handle.net/10722/345358
ISSN	1063-6919 2023 SCImago Journal Rankings: 10.331

DC Field	Value	Language
dc.contributor.author	Xu, Jiale	-
dc.contributor.author	Wang, Xintao	-
dc.contributor.author	Cheng, Weihao	-
dc.contributor.author	Cao, Yan Pei	-
dc.contributor.author	Shan, Ying	-
dc.contributor.author	Qie, Xiaohu	-
dc.contributor.author	Gao, Shenghua	-
dc.date.accessioned	2024-08-15T09:26:51Z	-
dc.date.available	2024-08-15T09:26:51Z	-
dc.date.issued	2023	-
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023, v. 2023-June, p. 20908-20918	-
dc.identifier.issn	1063-6919	-
dc.identifier.uri	http://hdl.handle.net/10722/345358	-
dc.description.abstract	Recent CLIP-guided 3D optimization methods, such as DreamFields [19] and PureCLIPNeRF [24], have achieved impressive results in zero-shot text-to-3D synthesis. However, due to scratch training and random initialization without prior knowledge, these methods often fail to generate accurate and faithful 3D structures that conform to the input text. In this paper, we make the first attempt to introduce explicit 3D shape priors into the CLIP-guided 3D optimization process. Specifically, we first generate a high-quality 3D shape from the input text in the text-to-shape stage as a 3D shape prior. We then use it as the initialization of a neural radiance field and optimize it with the full prompt. To address the challenging text-to-shape generation task, we present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model. To narrow the style domain gap between the images synthesized by the text-to-image diffusion model and shape renderings used to train the image-to-shape generator, we further propose to jointly optimize a learnable text prompt and fine-tune the text-to-image diffusion model for rendering-style image generation. Our method, Dream3D, is capable of generating imaginative 3D content with superior visual quality and shape accuracy compared to state-of-the-art methods. Our project page is at https://bluestyle97.github.io/dream3d/.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition	-
dc.subject	and reasoning	-
dc.subject	language	-
dc.subject	Vision	-
dc.title	Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/CVPR52729.2023.02003	-
dc.identifier.scopus	eid_2-s2.0-85173974761	-
dc.identifier.volume	2023-June	-
dc.identifier.spage	20908	-
dc.identifier.epage	20918	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats