Multi-view reconstruction via global 3D representation

Wang, Peng; 王鵬

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Multi-view reconstruction via global 3D representation

Title	Multi-view reconstruction via global 3D representation
Authors	Wang, Peng 王鵬
Advisors	Advisor(s):Komura, T Wang, WP
Issue Date	2024
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Wang, P. [王鵬]. (2024). Multi-view reconstruction via global 3D representation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Humans naturally engage with three-dimensional content in various applications such as virtual reality, movie visual effects, and gaming. The creation of high-quality 3D assets remains challenging and costly, demanding specialized software and skilled labor. The rise of RGB cameras has democratized access to 3D modeling, focusing on reconstruction solely from images, thereby making it more accessible to the general public. This thesis focuses on the task of multi-view 3D reconstruction from RGB images. The traditional multi-view reconstruction pipeline involves long and complex processing components based on local depth map representations. It operates in an open-loop fashion and is error-prone. In contrast, in this thesis, our focus lies on addressing the multi-view reconstruction problem through global 3D reconstruction. A proper global 3D representation should directly reflect the contents of the input RGB data and can be optimized or estimated based on the reconstruction error observed in the input images. This inherent simplicity allows for more robust and higher-quality reconstructed results compared to traditional 3D reconstruction pipelines. However, challenges persist in designing the form of 3D representations and reconstruction algorithms. In this thesis, we explore various forms of 3D representations and reconstruction algorithms that are suited for different image capture settings. In the first part of the thesis, we address the challenge of reconstructing complex thin structures from RGB video frames without the need for camera pose input. We develop a global 3D curve graph representation and simultaneously optimize both the camera poses and curve representations to align with the input images. This optimization process incorporates tailored measures for geometry, topology, and self-occlusion handling, thereby facilitating the reconstruction of 3D thin structures. While this global curve representation performs well in reconstructing thin structures, it lacks the capability to reconstruct general surfaces. In the second part, we address this issue by employing a neural signed distance function (SDF) to represent surface geometry. Building upon this neural surface representation, we introduce a specialized differentiable volume rendering technique tailored for this surface representation. This enhancement aims to make the reconstruction process more robust compared to previous methods of neural surface reconstruction. In the third part, we propose a solution to address the challenge of reconstructing and synthesizing novel views for large-scale, unbounded scenes. Our approach involves representing the scene using a grid-based neural radiance field (NeRF), which enables the accommodation of arbitrary input camera trajectories while requiring only a few minutes for training. In the fourth part, we introduce PF-LRM, a large reconstruction model for generalized sparse-view object reconstruction. PF-LRM utilizes tri-plane tokens to encode object neural radiance fields. This highly scalable method leverages self-attention blocks to enable effective information exchange between 3D object tokens and 2D image tokens, consistently outperforming baseline methods in accurately predicting pose and enhancing 3D reconstruction quality across diverse evaluation datasets.
Degree	Doctor of Philosophy
Subject	Reconstruction (Graph theory) Image reconstruction Computer vision Three-dimensional imaging
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/344138

DC Field	Value	Language
dc.contributor.advisor	Komura, T	-
dc.contributor.advisor	Wang, WP	-
dc.contributor.author	Wang, Peng	-
dc.contributor.author	王鵬	-
dc.date.accessioned	2024-07-16T02:16:44Z	-
dc.date.available	2024-07-16T02:16:44Z	-
dc.date.issued	2024	-
dc.identifier.citation	Wang, P. [王鵬]. (2024). Multi-view reconstruction via global 3D representation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/344138	-
dc.description.abstract	Humans naturally engage with three-dimensional content in various applications such as virtual reality, movie visual effects, and gaming. The creation of high-quality 3D assets remains challenging and costly, demanding specialized software and skilled labor. The rise of RGB cameras has democratized access to 3D modeling, focusing on reconstruction solely from images, thereby making it more accessible to the general public. This thesis focuses on the task of multi-view 3D reconstruction from RGB images. The traditional multi-view reconstruction pipeline involves long and complex processing components based on local depth map representations. It operates in an open-loop fashion and is error-prone. In contrast, in this thesis, our focus lies on addressing the multi-view reconstruction problem through global 3D reconstruction. A proper global 3D representation should directly reflect the contents of the input RGB data and can be optimized or estimated based on the reconstruction error observed in the input images. This inherent simplicity allows for more robust and higher-quality reconstructed results compared to traditional 3D reconstruction pipelines. However, challenges persist in designing the form of 3D representations and reconstruction algorithms. In this thesis, we explore various forms of 3D representations and reconstruction algorithms that are suited for different image capture settings. In the first part of the thesis, we address the challenge of reconstructing complex thin structures from RGB video frames without the need for camera pose input. We develop a global 3D curve graph representation and simultaneously optimize both the camera poses and curve representations to align with the input images. This optimization process incorporates tailored measures for geometry, topology, and self-occlusion handling, thereby facilitating the reconstruction of 3D thin structures. While this global curve representation performs well in reconstructing thin structures, it lacks the capability to reconstruct general surfaces. In the second part, we address this issue by employing a neural signed distance function (SDF) to represent surface geometry. Building upon this neural surface representation, we introduce a specialized differentiable volume rendering technique tailored for this surface representation. This enhancement aims to make the reconstruction process more robust compared to previous methods of neural surface reconstruction. In the third part, we propose a solution to address the challenge of reconstructing and synthesizing novel views for large-scale, unbounded scenes. Our approach involves representing the scene using a grid-based neural radiance field (NeRF), which enables the accommodation of arbitrary input camera trajectories while requiring only a few minutes for training. In the fourth part, we introduce PF-LRM, a large reconstruction model for generalized sparse-view object reconstruction. PF-LRM utilizes tri-plane tokens to encode object neural radiance fields. This highly scalable method leverages self-attention blocks to enable effective information exchange between 3D object tokens and 2D image tokens, consistently outperforming baseline methods in accurately predicting pose and enhancing 3D reconstruction quality across diverse evaluation datasets.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Reconstruction (Graph theory)	-
dc.subject.lcsh	Image reconstruction	-
dc.subject.lcsh	Computer vision	-
dc.subject.lcsh	Three-dimensional imaging	-
dc.title	Multi-view reconstruction via global 3D representation	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2024	-
dc.identifier.mmsid	991044829503303414	-

File Download

Supplementary

postgraduate thesis: Multi-view reconstruction via global 3D representation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats