GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Qi, Zhangyang; Fang, Ye; Sun, Zeyi; Wu, Xiaoyang; Wu, Tong; Wang, Jiaqi; Lin, Dahua; Zhao, Hengshuang

File Download

content.pdf

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Title	GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Authors	Qi, Zhangyang Fang, Ye Sun, Zeyi Wu, Xiaoyang Wu, Tong Wang, Jiaqi Lin, Dahua Zhao, Hengshuang
Issue Date	1-May-2024
Abstract	Multimodal Large Language Models (MLLMs) have excelled in 2D image-text comprehension and image generation, but their understanding of the 3D world is notably deficient, limiting progress in 3D language understanding and generation. To solve this problem, we introduce GPT4Point, an innovative groundbreaking point-language multimodal model designed specifically for unified 3D object understanding and generation within the MLLM framework. GPT4Point as a powerful 3D MLLM seamlessly can execute a variety of point-text reference tasks such as point-cloud captioning and Q&A. Additionally, GPT4Point is equipped with advanced capabilities for controllable 3D generation, it can get high-quality results through a low-quality point-text feature maintaining the geometric shapes and colors. To support the expansive needs of 3D object-text pairs, we develop Pyramid-XL, a point-language dataset annotation engine. It constructs a large-scale database over 1M objects of varied text granularity levels from the Objaverse-XL dataset, essential for training GPT4Point. A comprehensive benchmark has been proposed to evaluate 3D point-language understanding capabilities. In extensive evaluations, GPT4Point has demonstrated superior performance in understanding and generation.
Persistent Identifier	http://hdl.handle.net/10722/346131

DC Field	Value	Language
dc.contributor.author	Qi, Zhangyang	-
dc.contributor.author	Fang, Ye	-
dc.contributor.author	Sun, Zeyi	-
dc.contributor.author	Wu, Xiaoyang	-
dc.contributor.author	Wu, Tong	-
dc.contributor.author	Wang, Jiaqi	-
dc.contributor.author	Lin, Dahua	-
dc.contributor.author	Zhao, Hengshuang	-
dc.date.accessioned	2024-09-10T00:30:42Z	-
dc.date.available	2024-09-10T00:30:42Z	-
dc.date.issued	2024-05-01	-
dc.identifier.uri	http://hdl.handle.net/10722/346131	-
dc.description.abstract	<p>Multimodal Large Language Models (MLLMs) have excelled in 2D image-text comprehension and image generation, but their understanding of the 3D world is notably deficient, limiting progress in 3D language understanding and generation. To solve this problem, we introduce GPT4Point, an innovative groundbreaking point-language multimodal model designed specifically for <strong>unified 3D object understanding and generation within the MLLM framework</strong>. GPT4Point as a powerful 3D MLLM seamlessly can execute a variety of point-text reference tasks such as point-cloud captioning and Q&A. Additionally, GPT4Point is equipped with advanced capabilities for controllable 3D generation, it can get high-quality results through a low-quality point-text feature maintaining the geometric shapes and colors. To support the expansive needs of 3D object-text pairs, we develop <strong>Pyramid-XL, a point-language dataset annotation engine</strong>. It constructs a large-scale database over 1M objects of varied text granularity levels from the Objaverse-XL dataset, essential for training GPT4Point. <strong>A comprehensive benchmark</strong> has been proposed to evaluate 3D point-language understanding capabilities. In extensive evaluations, GPT4Point has demonstrated superior performance in understanding and generation.<br></p>	-
dc.language	eng	-
dc.relation.ispartof	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (17/06/2024-21/06/2024, Seattle)	-
dc.title	GPT4Point: A Unified Framework for Point-Language Understanding and Generation	-
dc.type	Conference_Paper	-
dc.description.nature	published_or_final_version	-

File Download

Supplementary

Conference Paper: GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats