File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Neural 3D object reconstruction
Title | Neural 3D object reconstruction |
---|---|
Authors | |
Issue Date | 2024 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Liu, Y. [劉緣]. (2024). Neural 3D object reconstruction. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Object reconstruction and recognition is an important task in various 3D applications such as AR/VR, autonomous driving, and robotics, which has been an active research field for decades. While this research achieved great improvements in recent years, it is still challenging to accurately and holistically reconstruct arbitrary objects from casually captured images. Moreover, downstream 3D tasks do not only require reconstructing the geometry of the object but also need to localize, render, or relight the object. In this thesis, we present a framework to holistically reconstruct objects from casually captured images. This framework starts from pose estimation of captured images and enables rendering novel views of the object, reconstructing both the geometry and the material of the object, and localizing the location and the orientation of the object in the new images.
For accurate estimation of the relative camera poses between casually captured images, we exploit the motion coherence property of two images to filter out erroneous correspondences. We analyze the motion coherence property on the graph and integrate this graph-based analysis into a learning-based framework for correspondence pruning. Finally, we estimate relative camera poses from the retained reliable correspondences for various downstream tasks.
Given the images with known poses, we study the problem of synthesizing novel-view images of objects. We propose a new ray-based representation, which constructs a neural radiance field on the fly for novel-view synthesis. To handle the artifacts brought by the self-occlusions of the object, we propose a visibility-aware rendering scheme to greatly improve the rendering quality of objects.
With the posed images of an object, we further estimate the locations and orientations of this object in a new image of the same object in a new environment, which is called 6 6-degree-of-freedom (6DoF) pose estimation of the object. In comparison to the previous 6DoF pose estimator which requires tedious retraining for different objects or categories, our estimator can be applied to arbitrary objects without pertaining and only requires the posed images of the object.
To reconstruct high-quality 3D meshes from pose images, we proposed a neural representation-based reconstruction to reconstruct the objects. This neural representation enables the accurate reconstruction of highly reflective objects by building a neural Signed Distance Field (SDF) from the multiview images. To achieve this goal, our method essentially combines the evaluation of the rendering equation with the volume rendering of the neural SDF, which also enables us to estimate the materials of the objects.
Multiview geometry reconstruction often requires dense images captured on the object and these reconstruction methods fail when only one image is available. We further extend the diffusion model to generate multiview consistent images on novel viewpoints, which enables us to reconstruct the geometry of the object even from one image.
By integrating all the above methods, we achieve a comprehensive 3D reconstruction framework that utilizes learning-based approaches and neural representations. This framework facilitates novel view synthesis, 6DoF pose estimation, and the estimation of object shape and material from casually captured images or single-view images. It significantly expands the applications of image-based object reconstruction and holds promise for various downstream tasks like robotics, AR/VR, and autonomous driving. These advancements demonstrate the capacity of learning-based techniques and neural representations to advance the frontier of 3D object reconstruction. |
Degree | Doctor of Philosophy |
Subject | Image reconstruction Three-dimensional imaging |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/345422 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Liu, Yuan | - |
dc.contributor.author | 劉緣 | - |
dc.date.accessioned | 2024-08-26T08:59:41Z | - |
dc.date.available | 2024-08-26T08:59:41Z | - |
dc.date.issued | 2024 | - |
dc.identifier.citation | Liu, Y. [劉緣]. (2024). Neural 3D object reconstruction. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/345422 | - |
dc.description.abstract | Object reconstruction and recognition is an important task in various 3D applications such as AR/VR, autonomous driving, and robotics, which has been an active research field for decades. While this research achieved great improvements in recent years, it is still challenging to accurately and holistically reconstruct arbitrary objects from casually captured images. Moreover, downstream 3D tasks do not only require reconstructing the geometry of the object but also need to localize, render, or relight the object. In this thesis, we present a framework to holistically reconstruct objects from casually captured images. This framework starts from pose estimation of captured images and enables rendering novel views of the object, reconstructing both the geometry and the material of the object, and localizing the location and the orientation of the object in the new images. For accurate estimation of the relative camera poses between casually captured images, we exploit the motion coherence property of two images to filter out erroneous correspondences. We analyze the motion coherence property on the graph and integrate this graph-based analysis into a learning-based framework for correspondence pruning. Finally, we estimate relative camera poses from the retained reliable correspondences for various downstream tasks. Given the images with known poses, we study the problem of synthesizing novel-view images of objects. We propose a new ray-based representation, which constructs a neural radiance field on the fly for novel-view synthesis. To handle the artifacts brought by the self-occlusions of the object, we propose a visibility-aware rendering scheme to greatly improve the rendering quality of objects. With the posed images of an object, we further estimate the locations and orientations of this object in a new image of the same object in a new environment, which is called 6 6-degree-of-freedom (6DoF) pose estimation of the object. In comparison to the previous 6DoF pose estimator which requires tedious retraining for different objects or categories, our estimator can be applied to arbitrary objects without pertaining and only requires the posed images of the object. To reconstruct high-quality 3D meshes from pose images, we proposed a neural representation-based reconstruction to reconstruct the objects. This neural representation enables the accurate reconstruction of highly reflective objects by building a neural Signed Distance Field (SDF) from the multiview images. To achieve this goal, our method essentially combines the evaluation of the rendering equation with the volume rendering of the neural SDF, which also enables us to estimate the materials of the objects. Multiview geometry reconstruction often requires dense images captured on the object and these reconstruction methods fail when only one image is available. We further extend the diffusion model to generate multiview consistent images on novel viewpoints, which enables us to reconstruct the geometry of the object even from one image. By integrating all the above methods, we achieve a comprehensive 3D reconstruction framework that utilizes learning-based approaches and neural representations. This framework facilitates novel view synthesis, 6DoF pose estimation, and the estimation of object shape and material from casually captured images or single-view images. It significantly expands the applications of image-based object reconstruction and holds promise for various downstream tasks like robotics, AR/VR, and autonomous driving. These advancements demonstrate the capacity of learning-based techniques and neural representations to advance the frontier of 3D object reconstruction. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Image reconstruction | - |
dc.subject.lcsh | Three-dimensional imaging | - |
dc.title | Neural 3D object reconstruction | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044843667103414 | - |