Learning-based 3D depth estimation and surface reconstruction from 2D images

Long, Xiaoxiao; 龙霄潇

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Learning-based 3D depth estimation and surface reconstruction from 2D images

Title	Learning-based 3D depth estimation and surface reconstruction from 2D images
Authors	Long, Xiaoxiao 龙霄潇
Advisors	Advisor(s):Komura, T Wang, WP
Issue Date	2023
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Long, X. [龙霄潇]. (2023). Learning-based 3D depth estimation and surface reconstruction from 2D images. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	The tremendous advancements in 3D applications, such as AR/VR, video games, autonomous driving, and robotics, have fueled extensive research in the field of 3D reconstruction from 2D images. While existing reconstruction methods have achieved impressive results, there is still room for improvement, particularly in challenging scenarios. These include reconstructing surfaces with weak textures, using a limited set of images as input, and modeling objects with open boundaries. In this thesis, our focus is on harnessing learning-based techniques to address these challenges. We observe that surfaces with weak textures are often planar and human-made, such as walls, floors, and tables. To improve the quality of depth estimation for such surfaces, we propose enforcing a surface normal constraint in learning-based methods. By incorporating the surface normal constraint, we significantly enhance the geometric accuracy of the estimated depth. Additionally, we introduce a geometry-aware surface normal calculation method that adaptively determines reliable local geometry to approximate the surface normal, especially at regions with geometric variations. When dealing with temporally coherent input images, such as frames of a video, we exploit the temporal information among the frames to enhance depth estimation. We propose an epipolar spatio-temporal transformer that explicitly incorporates the temporal information based on multi-view epipolar geometry. This approach leads to more accurate and temporally consistent depth maps. In scenarios where only a limited set of images is available, existing reconstruction approaches often yield incomplete or distorted results. To overcome this limitation, we propose a neural rendering-based method that learns generalizable priors from the input images for generic geometry reasoning. These learned priors enable our method to reconstruct high-quality results with limited images. We also introduce a consistency-aware fine-tuning scheme, which enhances reconstruction details with low computational and time costs. Furthermore, while recent neural rendering-based reconstruction methods have achieved impressive outcomes, they are typically limited to objects with closed surfaces, using Signed Distance Functions (SDF) as the surface representation. To reconstruct surfaces with arbitrary topologies from 2D images, we propose representing surfaces as Unsigned Distance Functions (UDF) and develop a novel volume rendering scheme to learn the neural UDF representation. Our method enables high-quality reconstruction of non-closed shapes with complex topologies while achieving comparable performance to SDF-based methods for closed surfaces. Through these proposed techniques, we aim to advance the field of 3D reconstruction by leveraging learning-based approaches to overcome challenges related to weak textures, limited image sets, and open boundaries. The outcomes of this research have the potential to enhance the quality and fidelity of 3D reconstructions, contributing to the development of various 3D applications and pushing the boundaries of what is achievable in reconstructing the 3D world from 2D images.
Degree	Doctor of Philosophy
Subject	Image processing - Digital techniques Three-dimensional imaging
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/330274

DC Field	Value	Language
dc.contributor.advisor	Komura, T	-
dc.contributor.advisor	Wang, WP	-
dc.contributor.author	Long, Xiaoxiao	-
dc.contributor.author	龙霄潇	-
dc.date.accessioned	2023-08-31T09:18:24Z	-
dc.date.available	2023-08-31T09:18:24Z	-
dc.date.issued	2023	-
dc.identifier.citation	Long, X. [龙霄潇]. (2023). Learning-based 3D depth estimation and surface reconstruction from 2D images. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/330274	-
dc.description.abstract	The tremendous advancements in 3D applications, such as AR/VR, video games, autonomous driving, and robotics, have fueled extensive research in the field of 3D reconstruction from 2D images. While existing reconstruction methods have achieved impressive results, there is still room for improvement, particularly in challenging scenarios. These include reconstructing surfaces with weak textures, using a limited set of images as input, and modeling objects with open boundaries. In this thesis, our focus is on harnessing learning-based techniques to address these challenges. We observe that surfaces with weak textures are often planar and human-made, such as walls, floors, and tables. To improve the quality of depth estimation for such surfaces, we propose enforcing a surface normal constraint in learning-based methods. By incorporating the surface normal constraint, we significantly enhance the geometric accuracy of the estimated depth. Additionally, we introduce a geometry-aware surface normal calculation method that adaptively determines reliable local geometry to approximate the surface normal, especially at regions with geometric variations. When dealing with temporally coherent input images, such as frames of a video, we exploit the temporal information among the frames to enhance depth estimation. We propose an epipolar spatio-temporal transformer that explicitly incorporates the temporal information based on multi-view epipolar geometry. This approach leads to more accurate and temporally consistent depth maps. In scenarios where only a limited set of images is available, existing reconstruction approaches often yield incomplete or distorted results. To overcome this limitation, we propose a neural rendering-based method that learns generalizable priors from the input images for generic geometry reasoning. These learned priors enable our method to reconstruct high-quality results with limited images. We also introduce a consistency-aware fine-tuning scheme, which enhances reconstruction details with low computational and time costs. Furthermore, while recent neural rendering-based reconstruction methods have achieved impressive outcomes, they are typically limited to objects with closed surfaces, using Signed Distance Functions (SDF) as the surface representation. To reconstruct surfaces with arbitrary topologies from 2D images, we propose representing surfaces as Unsigned Distance Functions (UDF) and develop a novel volume rendering scheme to learn the neural UDF representation. Our method enables high-quality reconstruction of non-closed shapes with complex topologies while achieving comparable performance to SDF-based methods for closed surfaces. Through these proposed techniques, we aim to advance the field of 3D reconstruction by leveraging learning-based approaches to overcome challenges related to weak textures, limited image sets, and open boundaries. The outcomes of this research have the potential to enhance the quality and fidelity of 3D reconstructions, contributing to the development of various 3D applications and pushing the boundaries of what is achievable in reconstructing the 3D world from 2D images.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Image processing - Digital techniques	-
dc.subject.lcsh	Three-dimensional imaging	-
dc.title	Learning-based 3D depth estimation and surface reconstruction from 2D images	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2023	-
dc.identifier.mmsid	991044717470703414	-

File Download

Supplementary

postgraduate thesis: Learning-based 3D depth estimation and surface reconstruction from 2D images

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats