Light field image processing and analysis based on deep learning

Zhang, Shansi; 張善思

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Electrical & Electronic Engineering: Theses

postgraduate thesis: Light field image processing and analysis based on deep learning

Title	Light field image processing and analysis based on deep learning
Authors	Zhang, Shansi 張善思
Issue Date	2024
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Zhang, S. [張善思]. (2024). Light field image processing and analysis based on deep learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Light field (LF) imaging is an advanced imaging technology that captures not only the intensities but also the directions of incoming light rays, which allows for the acquisition of geometric information about the scene and enables the extraction of multiple views. The characteristic of LF imaging offers significant potential for various vision applications, including post-capture refocusing, depth estimation, 3D scene reconstruction, salient object detection, etc. However, the high dimensionality of LF data also poses challenges in terms of its processing and analysis. This dissertation aims to explore LF image processing and analysis using deep learning-based methods, with a specific focus on effectively and efficiently leveraging the multi-view information to achieve superior performance in LF image restoration and scene understanding tasks. Regarding LF image restoration, we first propose a deep Retinex framework with spatial-angular attention modules and spatial-angular interaction modules to restore low-light LF images. To further enhance both efficiency and performance, we then propose a low-light restoration transformer (LRT), which incorporates spatial transformer blocks to extract multi-scale spatial features within each view, angular transformer blocks to integrate information from all the views, and multiple heads to address specific intermediate tasks. Next, instead of concentrating on task-specific solutions, we introduce a unified LF image restoration framework based on latent diffusion and multi-view attention, which is applicable to various LF restoration tasks, including LF image deraining and low-light LF image enhancement. Extensive experimental results validate the effectiveness of our proposed methods in generating high-quality LF images from severely degraded inputs. Regarding scene understanding from LF images, we target on two tasks: disparity estimation and semantic segmentation. Rather than employing fully supervised methods that require a substantial amount of pixel-wise annotations, our objective is to develop unsupervised and semi-supervised methods that are applicable to scenarios where obtaining extensive annotations may be challenging or costly. More specifically, we first propose an unsupervised disparity estimation framework for LF images, which estimates disparity maps from multiple input view combinations by performing multi-view feature matching in a coarse-to-fine manner and then merges these disparity maps through an effective disparity fusion strategy. An occlusion prediction network is introduced to alleviate the impact of occlusions during training. Next, we propose an unsupervised disparity estimation network for LF videos, which comprises a matching branch and a refinement branch, and incorporates a multi-frame feature fusion module and a cost aggregator with cross-depth self-attention. Additionally, we introduce a left-right consistency strategy to predict occlusion regions. Finally, we develop a semi-supervised LF semantic segmentation method that harnesses LF disparity information while requiring only a small subset of labeled data for training. The disparity information is utilized to generate more reliable pseudo-labels along with corresponding weight maps, and serves as structure references for the predicted probability maps of unlabeled data. Moreover, we propose a contrastive learning scheme at both pixel and object levels to further improve the segmentation performance. Extensive experimental results demonstrate the efficacy of our unsupervised and semi-supervised methods for LF scene understanding.
Degree	Doctor of Philosophy
Subject	Imaging systems Image processing - Digital techniques Deep learning (Machine learning)
Dept/Program	Electrical and Electronic Engineering
Persistent Identifier	http://hdl.handle.net/10722/351019

DC Field	Value	Language
dc.contributor.author	Zhang, Shansi	-
dc.contributor.author	張善思	-
dc.date.accessioned	2024-11-08T07:10:45Z	-
dc.date.available	2024-11-08T07:10:45Z	-
dc.date.issued	2024	-
dc.identifier.citation	Zhang, S. [張善思]. (2024). Light field image processing and analysis based on deep learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/351019	-
dc.description.abstract	Light field (LF) imaging is an advanced imaging technology that captures not only the intensities but also the directions of incoming light rays, which allows for the acquisition of geometric information about the scene and enables the extraction of multiple views. The characteristic of LF imaging offers significant potential for various vision applications, including post-capture refocusing, depth estimation, 3D scene reconstruction, salient object detection, etc. However, the high dimensionality of LF data also poses challenges in terms of its processing and analysis. This dissertation aims to explore LF image processing and analysis using deep learning-based methods, with a specific focus on effectively and efficiently leveraging the multi-view information to achieve superior performance in LF image restoration and scene understanding tasks. Regarding LF image restoration, we first propose a deep Retinex framework with spatial-angular attention modules and spatial-angular interaction modules to restore low-light LF images. To further enhance both efficiency and performance, we then propose a low-light restoration transformer (LRT), which incorporates spatial transformer blocks to extract multi-scale spatial features within each view, angular transformer blocks to integrate information from all the views, and multiple heads to address specific intermediate tasks. Next, instead of concentrating on task-specific solutions, we introduce a unified LF image restoration framework based on latent diffusion and multi-view attention, which is applicable to various LF restoration tasks, including LF image deraining and low-light LF image enhancement. Extensive experimental results validate the effectiveness of our proposed methods in generating high-quality LF images from severely degraded inputs. Regarding scene understanding from LF images, we target on two tasks: disparity estimation and semantic segmentation. Rather than employing fully supervised methods that require a substantial amount of pixel-wise annotations, our objective is to develop unsupervised and semi-supervised methods that are applicable to scenarios where obtaining extensive annotations may be challenging or costly. More specifically, we first propose an unsupervised disparity estimation framework for LF images, which estimates disparity maps from multiple input view combinations by performing multi-view feature matching in a coarse-to-fine manner and then merges these disparity maps through an effective disparity fusion strategy. An occlusion prediction network is introduced to alleviate the impact of occlusions during training. Next, we propose an unsupervised disparity estimation network for LF videos, which comprises a matching branch and a refinement branch, and incorporates a multi-frame feature fusion module and a cost aggregator with cross-depth self-attention. Additionally, we introduce a left-right consistency strategy to predict occlusion regions. Finally, we develop a semi-supervised LF semantic segmentation method that harnesses LF disparity information while requiring only a small subset of labeled data for training. The disparity information is utilized to generate more reliable pseudo-labels along with corresponding weight maps, and serves as structure references for the predicted probability maps of unlabeled data. Moreover, we propose a contrastive learning scheme at both pixel and object levels to further improve the segmentation performance. Extensive experimental results demonstrate the efficacy of our unsupervised and semi-supervised methods for LF scene understanding.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Imaging systems	-
dc.subject.lcsh	Image processing - Digital techniques	-
dc.subject.lcsh	Deep learning (Machine learning)	-
dc.title	Light field image processing and analysis based on deep learning	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Electrical and Electronic Engineering	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2024	-
dc.identifier.mmsid	991044869877403414	-

File Download

Supplementary

postgraduate thesis: Light field image processing and analysis based on deep learning

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats