File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Light field image processing and analysis based on deep learning
Title | Light field image processing and analysis based on deep learning |
---|---|
Authors | |
Issue Date | 2024 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Zhang, S. [張善思]. (2024). Light field image processing and analysis based on deep learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Light field (LF) imaging is an advanced imaging technology that captures not only the intensities but also the directions of incoming light rays, which allows for the acquisition of geometric information about the scene and enables the extraction of multiple views. The characteristic of LF imaging offers significant potential for various vision applications, including post-capture refocusing, depth estimation, 3D scene reconstruction, salient object detection, etc. However, the high dimensionality of LF data also poses challenges in terms of its processing and analysis.
This dissertation aims to explore LF image processing and analysis using deep learning-based methods, with a specific focus on effectively and efficiently leveraging the multi-view information to achieve superior performance in LF image restoration and scene understanding tasks.
Regarding LF image restoration, we first propose a deep Retinex framework with spatial-angular attention modules and spatial-angular interaction modules to restore low-light LF images. To further enhance both efficiency and performance, we then propose a low-light restoration transformer (LRT), which incorporates spatial transformer blocks to extract multi-scale spatial features within each view, angular transformer blocks to integrate information from all the views, and multiple heads to address specific intermediate tasks. Next, instead of concentrating on task-specific solutions, we introduce a unified LF image restoration framework based on latent diffusion and multi-view attention, which is applicable to various LF restoration tasks, including LF image deraining and low-light LF image enhancement. Extensive experimental results validate the effectiveness of our proposed methods in generating high-quality LF images from severely degraded inputs.
Regarding scene understanding from LF images, we target on two tasks: disparity estimation and semantic segmentation. Rather than employing fully supervised methods that require a substantial amount of pixel-wise annotations, our objective is to develop unsupervised and semi-supervised methods that are applicable to scenarios where obtaining extensive annotations may be challenging or costly. More specifically, we first propose an unsupervised disparity estimation framework for LF images, which estimates disparity maps from multiple input view combinations by performing multi-view feature matching in a coarse-to-fine manner and then merges these disparity maps through an effective disparity fusion strategy. An occlusion prediction network is introduced to alleviate the impact of occlusions during training. Next, we propose an unsupervised disparity estimation network for LF videos, which comprises a matching branch and a refinement branch, and incorporates a multi-frame feature fusion module and a cost aggregator with cross-depth self-attention. Additionally, we introduce a left-right consistency strategy to predict occlusion regions. Finally, we develop a semi-supervised LF semantic segmentation method that harnesses LF disparity information while requiring only a small subset of labeled data for training. The disparity information is utilized to generate more reliable pseudo-labels along with corresponding weight maps, and serves as structure references for the predicted probability maps of unlabeled data. Moreover, we propose a contrastive learning scheme at both pixel and object levels to further improve the segmentation performance. Extensive experimental results demonstrate the efficacy of our unsupervised and semi-supervised methods for LF scene understanding. |
Degree | Doctor of Philosophy |
Subject | Imaging systems Image processing - Digital techniques Deep learning (Machine learning) |
Dept/Program | Electrical and Electronic Engineering |
Persistent Identifier | http://hdl.handle.net/10722/351019 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhang, Shansi | - |
dc.contributor.author | 張善思 | - |
dc.date.accessioned | 2024-11-08T07:10:45Z | - |
dc.date.available | 2024-11-08T07:10:45Z | - |
dc.date.issued | 2024 | - |
dc.identifier.citation | Zhang, S. [張善思]. (2024). Light field image processing and analysis based on deep learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/351019 | - |
dc.description.abstract | Light field (LF) imaging is an advanced imaging technology that captures not only the intensities but also the directions of incoming light rays, which allows for the acquisition of geometric information about the scene and enables the extraction of multiple views. The characteristic of LF imaging offers significant potential for various vision applications, including post-capture refocusing, depth estimation, 3D scene reconstruction, salient object detection, etc. However, the high dimensionality of LF data also poses challenges in terms of its processing and analysis. This dissertation aims to explore LF image processing and analysis using deep learning-based methods, with a specific focus on effectively and efficiently leveraging the multi-view information to achieve superior performance in LF image restoration and scene understanding tasks. Regarding LF image restoration, we first propose a deep Retinex framework with spatial-angular attention modules and spatial-angular interaction modules to restore low-light LF images. To further enhance both efficiency and performance, we then propose a low-light restoration transformer (LRT), which incorporates spatial transformer blocks to extract multi-scale spatial features within each view, angular transformer blocks to integrate information from all the views, and multiple heads to address specific intermediate tasks. Next, instead of concentrating on task-specific solutions, we introduce a unified LF image restoration framework based on latent diffusion and multi-view attention, which is applicable to various LF restoration tasks, including LF image deraining and low-light LF image enhancement. Extensive experimental results validate the effectiveness of our proposed methods in generating high-quality LF images from severely degraded inputs. Regarding scene understanding from LF images, we target on two tasks: disparity estimation and semantic segmentation. Rather than employing fully supervised methods that require a substantial amount of pixel-wise annotations, our objective is to develop unsupervised and semi-supervised methods that are applicable to scenarios where obtaining extensive annotations may be challenging or costly. More specifically, we first propose an unsupervised disparity estimation framework for LF images, which estimates disparity maps from multiple input view combinations by performing multi-view feature matching in a coarse-to-fine manner and then merges these disparity maps through an effective disparity fusion strategy. An occlusion prediction network is introduced to alleviate the impact of occlusions during training. Next, we propose an unsupervised disparity estimation network for LF videos, which comprises a matching branch and a refinement branch, and incorporates a multi-frame feature fusion module and a cost aggregator with cross-depth self-attention. Additionally, we introduce a left-right consistency strategy to predict occlusion regions. Finally, we develop a semi-supervised LF semantic segmentation method that harnesses LF disparity information while requiring only a small subset of labeled data for training. The disparity information is utilized to generate more reliable pseudo-labels along with corresponding weight maps, and serves as structure references for the predicted probability maps of unlabeled data. Moreover, we propose a contrastive learning scheme at both pixel and object levels to further improve the segmentation performance. Extensive experimental results demonstrate the efficacy of our unsupervised and semi-supervised methods for LF scene understanding. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Imaging systems | - |
dc.subject.lcsh | Image processing - Digital techniques | - |
dc.subject.lcsh | Deep learning (Machine learning) | - |
dc.title | Light field image processing and analysis based on deep learning | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Electrical and Electronic Engineering | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044869877403414 | - |