Learning data consistency for image geometry and text under indirect supervision

Chen, Nenglun; 陳能侖

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Learning data consistency for image geometry and text under indirect supervision

Title	Learning data consistency for image geometry and text under indirect supervision
Authors	Chen, Nenglun 陳能侖
Advisors	Advisor(s):Luo, P Wang, WP
Issue Date	2022
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Chen, N. [陳能侖]. (2022). Learning data consistency for image geometry and text under indirect supervision. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Acquiring high quality labeled data for training deep neural networks is usually expensive and time consuming. It is thus of great importance to make the neural networks have the ability to learn from the data without direct supervision. In general, there exist various task configurations, including self-supervised learning, weakly supervised learning, and semi-supervised learning. Though the types of supervision used in different tasks vary a lot, learning data consistency is usually an essential problem in scenarios when direct supervision is not available. Due to the lack of direct supervision, researchers in this field usually leverage various kinds of task specific priors in the training stage, and the priors can be either manually designed or learned from a collection of related data. Moreover, for different data modalities, the priors to be used for learning data consistency can be quite different. And learning data consistency for multiple data modalities jointly is also very meaningful. In this thesis, we mainly focus on learning semantic consistency from a large collection of data and making the output of the neural networks to be semantically consistent across different data samples or modalities. Specifically, three problems are studied, including learning 3D shape consistency, learning image text consistency for image captioning as well as learning image representations with geometric set consistency. In the first part, a self-supervised learning method is proposed to learn structure points for 3D shapes in the form of point clouds. The learned structure points are consistent across different shapes with similar geometric structures. The proposed method is simple and quite effective. And the only constraint used for training the neural network is the chamfer distance loss which enforces the produced structure points to be uniformly distributed on the input point clouds. Extensive experiments on several downstream tasks, including 3D semantic correspondence, example based label transfer and PCA based shape embedding, demonstrate the effectiveness of the proposed method. For the problem of image text consistency in the scenario of image captioning, a simple yet effective neural network architecture is proposed. Specifically, the proposed architecture is built upon Long Short-Term Memory (LSTM) based framework, and a distributed attention mechanism is designed to make the neural network attending to some regions that have consistent semantics but with different spatial positions when producing the words. In this way, the partial grounding issue can be effectively alleviated. Qualitative and quantitative experiments verify the superiority of the proposed method. Finally, a novel contrastive learning based algorithm is presented for self-supervised learning of 2D image representations. Specifically, 3D geometric set consistency priors are used as strong cues to constrain the learned 2D image representations to be consistent within image views. And the InfoNCE loss is adapted accordingly to enforce set level consistency. The learned image representations are general and can be used to improve the performance of several 2D image-based indoor scene understanding tasks. Extensive experiments demonstrate the superior performance of our method compared with the state-of-the-art methods.
Degree	Doctor of Philosophy
Subject	Computer vision Image processing
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/323175

DC Field	Value	Language
dc.contributor.advisor	Luo, P	-
dc.contributor.advisor	Wang, WP	-
dc.contributor.author	Chen, Nenglun	-
dc.contributor.author	陳能侖	-
dc.date.accessioned	2022-11-23T10:25:16Z	-
dc.date.available	2022-11-23T10:25:16Z	-
dc.date.issued	2022	-
dc.identifier.citation	Chen, N. [陳能侖]. (2022). Learning data consistency for image geometry and text under indirect supervision. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/323175	-
dc.description.abstract	Acquiring high quality labeled data for training deep neural networks is usually expensive and time consuming. It is thus of great importance to make the neural networks have the ability to learn from the data without direct supervision. In general, there exist various task configurations, including self-supervised learning, weakly supervised learning, and semi-supervised learning. Though the types of supervision used in different tasks vary a lot, learning data consistency is usually an essential problem in scenarios when direct supervision is not available. Due to the lack of direct supervision, researchers in this field usually leverage various kinds of task specific priors in the training stage, and the priors can be either manually designed or learned from a collection of related data. Moreover, for different data modalities, the priors to be used for learning data consistency can be quite different. And learning data consistency for multiple data modalities jointly is also very meaningful. In this thesis, we mainly focus on learning semantic consistency from a large collection of data and making the output of the neural networks to be semantically consistent across different data samples or modalities. Specifically, three problems are studied, including learning 3D shape consistency, learning image text consistency for image captioning as well as learning image representations with geometric set consistency. In the first part, a self-supervised learning method is proposed to learn structure points for 3D shapes in the form of point clouds. The learned structure points are consistent across different shapes with similar geometric structures. The proposed method is simple and quite effective. And the only constraint used for training the neural network is the chamfer distance loss which enforces the produced structure points to be uniformly distributed on the input point clouds. Extensive experiments on several downstream tasks, including 3D semantic correspondence, example based label transfer and PCA based shape embedding, demonstrate the effectiveness of the proposed method. For the problem of image text consistency in the scenario of image captioning, a simple yet effective neural network architecture is proposed. Specifically, the proposed architecture is built upon Long Short-Term Memory (LSTM) based framework, and a distributed attention mechanism is designed to make the neural network attending to some regions that have consistent semantics but with different spatial positions when producing the words. In this way, the partial grounding issue can be effectively alleviated. Qualitative and quantitative experiments verify the superiority of the proposed method. Finally, a novel contrastive learning based algorithm is presented for self-supervised learning of 2D image representations. Specifically, 3D geometric set consistency priors are used as strong cues to constrain the learned 2D image representations to be consistent within image views. And the InfoNCE loss is adapted accordingly to enforce set level consistency. The learned image representations are general and can be used to improve the performance of several 2D image-based indoor scene understanding tasks. Extensive experiments demonstrate the superior performance of our method compared with the state-of-the-art methods.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Computer vision	-
dc.subject.lcsh	Image processing	-
dc.title	Learning data consistency for image geometry and text under indirect supervision	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2022	-
dc.identifier.mmsid	991044609100103414	-

File Download

Supplementary

postgraduate thesis: Learning data consistency for image geometry and text under indirect supervision

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats