File Download
Supplementary

postgraduate thesis: Learning data consistency for image geometry and text under indirect supervision

TitleLearning data consistency for image geometry and text under indirect supervision
Authors
Advisors
Advisor(s):Luo, PWang, WP
Issue Date2022
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Chen, N. [陳能侖]. (2022). Learning data consistency for image geometry and text under indirect supervision. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractAcquiring high quality labeled data for training deep neural networks is usually expensive and time consuming. It is thus of great importance to make the neural networks have the ability to learn from the data without direct supervision. In general, there exist various task configurations, including self-supervised learning, weakly supervised learning, and semi-supervised learning. Though the types of supervision used in different tasks vary a lot, learning data consistency is usually an essential problem in scenarios when direct supervision is not available. Due to the lack of direct supervision, researchers in this field usually leverage various kinds of task specific priors in the training stage, and the priors can be either manually designed or learned from a collection of related data. Moreover, for different data modalities, the priors to be used for learning data consistency can be quite different. And learning data consistency for multiple data modalities jointly is also very meaningful. In this thesis, we mainly focus on learning semantic consistency from a large collection of data and making the output of the neural networks to be semantically consistent across different data samples or modalities. Specifically, three problems are studied, including learning 3D shape consistency, learning image text consistency for image captioning as well as learning image representations with geometric set consistency. In the first part, a self-supervised learning method is proposed to learn structure points for 3D shapes in the form of point clouds. The learned structure points are consistent across different shapes with similar geometric structures. The proposed method is simple and quite effective. And the only constraint used for training the neural network is the chamfer distance loss which enforces the produced structure points to be uniformly distributed on the input point clouds. Extensive experiments on several downstream tasks, including 3D semantic correspondence, example based label transfer and PCA based shape embedding, demonstrate the effectiveness of the proposed method. For the problem of image text consistency in the scenario of image captioning, a simple yet effective neural network architecture is proposed. Specifically, the proposed architecture is built upon Long Short-Term Memory (LSTM) based framework, and a distributed attention mechanism is designed to make the neural network attending to some regions that have consistent semantics but with different spatial positions when producing the words. In this way, the partial grounding issue can be effectively alleviated. Qualitative and quantitative experiments verify the superiority of the proposed method. Finally, a novel contrastive learning based algorithm is presented for self-supervised learning of 2D image representations. Specifically, 3D geometric set consistency priors are used as strong cues to constrain the learned 2D image representations to be consistent within image views. And the InfoNCE loss is adapted accordingly to enforce set level consistency. The learned image representations are general and can be used to improve the performance of several 2D image-based indoor scene understanding tasks. Extensive experiments demonstrate the superior performance of our method compared with the state-of-the-art methods.
DegreeDoctor of Philosophy
SubjectComputer vision
Image processing
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/323175

 

DC FieldValueLanguage
dc.contributor.advisorLuo, P-
dc.contributor.advisorWang, WP-
dc.contributor.authorChen, Nenglun-
dc.contributor.author陳能侖-
dc.date.accessioned2022-11-23T10:25:16Z-
dc.date.available2022-11-23T10:25:16Z-
dc.date.issued2022-
dc.identifier.citationChen, N. [陳能侖]. (2022). Learning data consistency for image geometry and text under indirect supervision. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/323175-
dc.description.abstractAcquiring high quality labeled data for training deep neural networks is usually expensive and time consuming. It is thus of great importance to make the neural networks have the ability to learn from the data without direct supervision. In general, there exist various task configurations, including self-supervised learning, weakly supervised learning, and semi-supervised learning. Though the types of supervision used in different tasks vary a lot, learning data consistency is usually an essential problem in scenarios when direct supervision is not available. Due to the lack of direct supervision, researchers in this field usually leverage various kinds of task specific priors in the training stage, and the priors can be either manually designed or learned from a collection of related data. Moreover, for different data modalities, the priors to be used for learning data consistency can be quite different. And learning data consistency for multiple data modalities jointly is also very meaningful. In this thesis, we mainly focus on learning semantic consistency from a large collection of data and making the output of the neural networks to be semantically consistent across different data samples or modalities. Specifically, three problems are studied, including learning 3D shape consistency, learning image text consistency for image captioning as well as learning image representations with geometric set consistency. In the first part, a self-supervised learning method is proposed to learn structure points for 3D shapes in the form of point clouds. The learned structure points are consistent across different shapes with similar geometric structures. The proposed method is simple and quite effective. And the only constraint used for training the neural network is the chamfer distance loss which enforces the produced structure points to be uniformly distributed on the input point clouds. Extensive experiments on several downstream tasks, including 3D semantic correspondence, example based label transfer and PCA based shape embedding, demonstrate the effectiveness of the proposed method. For the problem of image text consistency in the scenario of image captioning, a simple yet effective neural network architecture is proposed. Specifically, the proposed architecture is built upon Long Short-Term Memory (LSTM) based framework, and a distributed attention mechanism is designed to make the neural network attending to some regions that have consistent semantics but with different spatial positions when producing the words. In this way, the partial grounding issue can be effectively alleviated. Qualitative and quantitative experiments verify the superiority of the proposed method. Finally, a novel contrastive learning based algorithm is presented for self-supervised learning of 2D image representations. Specifically, 3D geometric set consistency priors are used as strong cues to constrain the learned 2D image representations to be consistent within image views. And the InfoNCE loss is adapted accordingly to enforce set level consistency. The learned image representations are general and can be used to improve the performance of several 2D image-based indoor scene understanding tasks. Extensive experiments demonstrate the superior performance of our method compared with the state-of-the-art methods.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshComputer vision-
dc.subject.lcshImage processing-
dc.titleLearning data consistency for image geometry and text under indirect supervision-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2022-
dc.identifier.mmsid991044609100103414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats