Image understanding from imperfect data via transfer learning, metric learning, and weakly supervised learning

Ge, Weifeng; 戈维峰

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_991044214995203414

Supplementary

Citations:
Appears in Collections:
- Computer Science: Theses
- HKU Theses Online

postgraduate thesis: Image understanding from imperfect data via transfer learning, metric learning, and weakly supervised learning

Title	Image understanding from imperfect data via transfer learning, metric learning, and weakly supervised learning
Authors	Ge, Weifeng 戈维峰
Advisors	Advisor(s):Yu, Y
Issue Date	2019
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Ge, W. [戈维峰]. (2019). Image understanding from imperfect data via transfer learning, metric learning, and weakly supervised learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Huge amount of labeled data has led to a set of breakthroughs in image understanding, such as object/scene recognition, object detection, semantic segmentation, and etc. However, for most real world problems, it’s expensive to obtain high quality training data sets. Learn from imperfect data relies heavily on knowledge distillation, transfer and enhancement. Based on learning methods including transfer learning, metric learning, weakly supervised learning, and etc, in this thesis, we propose novel algorithms for three problems: classifying images with insufficient training data, learning feature embeddings for image distance calculation, and recognizing pixels from image level annotation. Given an image classification task with insufficient training data, selective joint finetuning (written as SJFT), is designed to select samples from a source data set which is rich in data annotation to learn the convolutional kernels efficiently. SJFT aims to improve the generalization ability of the learned deep features by solving two classification tasks in one deep network—one is the source learning task which contains huge amount labeled data, and the other one is target learning task which lacks of training data. Low level characteristics—histogram features generated by Gabor filters or convolutional filters, are used to select valuable samples from a large scale labeled data set. Experimental results indicate adopting the source-target joint training strategy learns stronger discriminative features on Caltech 256, MIT Indoor 67, and fine-grained classification problems (Oxford FLowers 102 and Stanford Dogs 120). Deep metric learning or similarity learning Learns a distance function over images with convolutional neural networks. The large scale sampling space and the risk of local optima makes it challenging to design deep metric learning loss functions. In this work, we propose a novel hierarchical triplet loss (written as HTL) able to automatically collect informative training triplets via an adaptively-learned hierarchical class structure that encodes global context in an elegant manner. The proposed HTL outperforms the standard triplet loss substantially by 1%-18%, and achieves new state-of-art performance on In-Shop Clothes Retrieval, Caltech-UCSD Birds 200-2011, Cars 196, and Stanford Online Products. To target the limit of deep neural networks in recognizing pixels from image level labels, we propose a novel weakly supervised curriculum learning pipeline for multi-label object recognition, detection, and semantic segmentation. It is called multi-evidence filtering and fusion (written as MEFF). MEFF follows the divide-and-conquer strategy to solve the weakly supervised learning task with three stages — the pixel level stage, the object level stage, and the pixel level stage. With image level labels, we perform multi-label object recognition at first. Then both metric learning and density-based clustering are incorporated to filter detected object instances. To obtain a relatively clean pixel-wise probability map for every class and every training image, we propose a novel algorithm for fusing image level and object level attention maps with an object detection heat map. Experiments show that our weakly supervised pipeline achieves state-of-the-art results on MS-COCO, PASCAL VOC 2007 and PASCAL VOC 2012.
Degree	Doctor of Philosophy
Subject	Image analysis - Data processing Artificial intelligence
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/281581

DC Field	Value	Language
dc.contributor.advisor	Yu, Y	-
dc.contributor.author	Ge, Weifeng	-
dc.contributor.author	戈维峰	-
dc.date.accessioned	2020-03-18T11:32:57Z	-
dc.date.available	2020-03-18T11:32:57Z	-
dc.date.issued	2019	-
dc.identifier.citation	Ge, W. [戈维峰]. (2019). Image understanding from imperfect data via transfer learning, metric learning, and weakly supervised learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/281581	-
dc.description.abstract	Huge amount of labeled data has led to a set of breakthroughs in image understanding, such as object/scene recognition, object detection, semantic segmentation, and etc. However, for most real world problems, it’s expensive to obtain high quality training data sets. Learn from imperfect data relies heavily on knowledge distillation, transfer and enhancement. Based on learning methods including transfer learning, metric learning, weakly supervised learning, and etc, in this thesis, we propose novel algorithms for three problems: classifying images with insufficient training data, learning feature embeddings for image distance calculation, and recognizing pixels from image level annotation. Given an image classification task with insufficient training data, selective joint finetuning (written as SJFT), is designed to select samples from a source data set which is rich in data annotation to learn the convolutional kernels efficiently. SJFT aims to improve the generalization ability of the learned deep features by solving two classification tasks in one deep network—one is the source learning task which contains huge amount labeled data, and the other one is target learning task which lacks of training data. Low level characteristics—histogram features generated by Gabor filters or convolutional filters, are used to select valuable samples from a large scale labeled data set. Experimental results indicate adopting the source-target joint training strategy learns stronger discriminative features on Caltech 256, MIT Indoor 67, and fine-grained classification problems (Oxford FLowers 102 and Stanford Dogs 120). Deep metric learning or similarity learning Learns a distance function over images with convolutional neural networks. The large scale sampling space and the risk of local optima makes it challenging to design deep metric learning loss functions. In this work, we propose a novel hierarchical triplet loss (written as HTL) able to automatically collect informative training triplets via an adaptively-learned hierarchical class structure that encodes global context in an elegant manner. The proposed HTL outperforms the standard triplet loss substantially by 1%-18%, and achieves new state-of-art performance on In-Shop Clothes Retrieval, Caltech-UCSD Birds 200-2011, Cars 196, and Stanford Online Products. To target the limit of deep neural networks in recognizing pixels from image level labels, we propose a novel weakly supervised curriculum learning pipeline for multi-label object recognition, detection, and semantic segmentation. It is called multi-evidence filtering and fusion (written as MEFF). MEFF follows the divide-and-conquer strategy to solve the weakly supervised learning task with three stages — the pixel level stage, the object level stage, and the pixel level stage. With image level labels, we perform multi-label object recognition at first. Then both metric learning and density-based clustering are incorporated to filter detected object instances. To obtain a relatively clean pixel-wise probability map for every class and every training image, we propose a novel algorithm for fusing image level and object level attention maps with an object detection heat map. Experiments show that our weakly supervised pipeline achieves state-of-the-art results on MS-COCO, PASCAL VOC 2007 and PASCAL VOC 2012.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Image analysis - Data processing	-
dc.subject.lcsh	Artificial intelligence	-
dc.title	Image understanding from imperfect data via transfer learning, metric learning, and weakly supervised learning	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_991044214995203414	-
dc.date.hkucongregation	2020	-
dc.identifier.mmsid	991044214995203414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Image understanding from imperfect data via transfer learning, metric learning, and weakly supervised learning

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats