File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Multi-view reconstruction via global 3D representation
Title | Multi-view reconstruction via global 3D representation |
---|---|
Authors | |
Advisors | |
Issue Date | 2024 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Wang, P. [王鵬]. (2024). Multi-view reconstruction via global 3D representation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Humans naturally engage with three-dimensional content in various applications such as virtual reality, movie visual effects, and gaming. The creation of high-quality 3D assets remains challenging and costly, demanding specialized software and skilled labor. The rise of RGB cameras has democratized access to 3D modeling, focusing on reconstruction solely from images, thereby making it more accessible to the general public. This thesis focuses on the task of multi-view 3D reconstruction from RGB images. The traditional multi-view reconstruction pipeline involves long and complex processing components based on local depth map representations. It operates in an open-loop fashion and is error-prone.
In contrast, in this thesis, our focus lies on addressing the multi-view reconstruction problem through global 3D reconstruction. A proper global 3D representation should directly reflect the contents of the input RGB data and can be optimized or estimated based on the reconstruction error observed in the input images. This inherent simplicity allows for more robust and higher-quality reconstructed results compared to traditional 3D reconstruction pipelines. However, challenges persist in designing the form of 3D representations and reconstruction algorithms. In this thesis, we explore various forms of 3D representations and reconstruction algorithms that are suited for different image capture settings.
In the first part of the thesis, we address the challenge of reconstructing complex thin structures from RGB video frames without the need for camera pose input. We develop a global 3D curve graph representation and simultaneously optimize both the camera poses and curve representations to align with the input images. This optimization process incorporates tailored measures for geometry, topology, and self-occlusion handling, thereby facilitating the reconstruction of 3D thin structures.
While this global curve representation performs well in reconstructing thin structures, it lacks the capability to reconstruct general surfaces. In the second part, we address this issue by employing a neural signed distance function (SDF) to represent surface geometry. Building upon this neural surface representation, we introduce a specialized differentiable volume rendering technique tailored for this surface representation. This enhancement aims to make the reconstruction process more robust compared to previous methods of neural surface reconstruction.
In the third part, we propose a solution to address the challenge of reconstructing and synthesizing novel views for large-scale, unbounded scenes. Our approach involves representing the scene using a grid-based neural radiance field (NeRF), which enables the accommodation of arbitrary input camera trajectories while requiring only a few minutes for training.
In the fourth part, we introduce PF-LRM, a large reconstruction model for generalized sparse-view object reconstruction. PF-LRM utilizes tri-plane tokens to encode object neural radiance fields. This highly scalable method leverages self-attention blocks to enable effective information exchange between 3D object tokens and 2D image tokens, consistently outperforming baseline methods in accurately predicting pose and enhancing 3D reconstruction quality across diverse evaluation datasets.
|
Degree | Doctor of Philosophy |
Subject | Reconstruction (Graph theory) Image reconstruction Computer vision Three-dimensional imaging |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/344138 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Komura, T | - |
dc.contributor.advisor | Wang, WP | - |
dc.contributor.author | Wang, Peng | - |
dc.contributor.author | 王鵬 | - |
dc.date.accessioned | 2024-07-16T02:16:44Z | - |
dc.date.available | 2024-07-16T02:16:44Z | - |
dc.date.issued | 2024 | - |
dc.identifier.citation | Wang, P. [王鵬]. (2024). Multi-view reconstruction via global 3D representation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/344138 | - |
dc.description.abstract | Humans naturally engage with three-dimensional content in various applications such as virtual reality, movie visual effects, and gaming. The creation of high-quality 3D assets remains challenging and costly, demanding specialized software and skilled labor. The rise of RGB cameras has democratized access to 3D modeling, focusing on reconstruction solely from images, thereby making it more accessible to the general public. This thesis focuses on the task of multi-view 3D reconstruction from RGB images. The traditional multi-view reconstruction pipeline involves long and complex processing components based on local depth map representations. It operates in an open-loop fashion and is error-prone. In contrast, in this thesis, our focus lies on addressing the multi-view reconstruction problem through global 3D reconstruction. A proper global 3D representation should directly reflect the contents of the input RGB data and can be optimized or estimated based on the reconstruction error observed in the input images. This inherent simplicity allows for more robust and higher-quality reconstructed results compared to traditional 3D reconstruction pipelines. However, challenges persist in designing the form of 3D representations and reconstruction algorithms. In this thesis, we explore various forms of 3D representations and reconstruction algorithms that are suited for different image capture settings. In the first part of the thesis, we address the challenge of reconstructing complex thin structures from RGB video frames without the need for camera pose input. We develop a global 3D curve graph representation and simultaneously optimize both the camera poses and curve representations to align with the input images. This optimization process incorporates tailored measures for geometry, topology, and self-occlusion handling, thereby facilitating the reconstruction of 3D thin structures. While this global curve representation performs well in reconstructing thin structures, it lacks the capability to reconstruct general surfaces. In the second part, we address this issue by employing a neural signed distance function (SDF) to represent surface geometry. Building upon this neural surface representation, we introduce a specialized differentiable volume rendering technique tailored for this surface representation. This enhancement aims to make the reconstruction process more robust compared to previous methods of neural surface reconstruction. In the third part, we propose a solution to address the challenge of reconstructing and synthesizing novel views for large-scale, unbounded scenes. Our approach involves representing the scene using a grid-based neural radiance field (NeRF), which enables the accommodation of arbitrary input camera trajectories while requiring only a few minutes for training. In the fourth part, we introduce PF-LRM, a large reconstruction model for generalized sparse-view object reconstruction. PF-LRM utilizes tri-plane tokens to encode object neural radiance fields. This highly scalable method leverages self-attention blocks to enable effective information exchange between 3D object tokens and 2D image tokens, consistently outperforming baseline methods in accurately predicting pose and enhancing 3D reconstruction quality across diverse evaluation datasets. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Reconstruction (Graph theory) | - |
dc.subject.lcsh | Image reconstruction | - |
dc.subject.lcsh | Computer vision | - |
dc.subject.lcsh | Three-dimensional imaging | - |
dc.title | Multi-view reconstruction via global 3D representation | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044829503303414 | - |