File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Image understanding from imperfect data via transfer learning, metric learning, and weakly supervised learning

TitleImage understanding from imperfect data via transfer learning, metric learning, and weakly supervised learning
Authors
Advisors
Advisor(s):Yu, Y
Issue Date2019
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Ge, W. [戈维峰]. (2019). Image understanding from imperfect data via transfer learning, metric learning, and weakly supervised learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractHuge amount of labeled data has led to a set of breakthroughs in image understanding, such as object/scene recognition, object detection, semantic segmentation, and etc. However, for most real world problems, it’s expensive to obtain high quality training data sets. Learn from imperfect data relies heavily on knowledge distillation, transfer and enhancement. Based on learning methods including transfer learning, metric learning, weakly supervised learning, and etc, in this thesis, we propose novel algorithms for three problems: classifying images with insufficient training data, learning feature embeddings for image distance calculation, and recognizing pixels from image level annotation. Given an image classification task with insufficient training data, selective joint finetuning (written as SJFT), is designed to select samples from a source data set which is rich in data annotation to learn the convolutional kernels efficiently. SJFT aims to improve the generalization ability of the learned deep features by solving two classification tasks in one deep network—one is the source learning task which contains huge amount labeled data, and the other one is target learning task which lacks of training data. Low level characteristics—histogram features generated by Gabor filters or convolutional filters, are used to select valuable samples from a large scale labeled data set. Experimental results indicate adopting the source-target joint training strategy learns stronger discriminative features on Caltech 256, MIT Indoor 67, and fine-grained classification problems (Oxford FLowers 102 and Stanford Dogs 120). Deep metric learning or similarity learning Learns a distance function over images with convolutional neural networks. The large scale sampling space and the risk of local optima makes it challenging to design deep metric learning loss functions. In this work, we propose a novel hierarchical triplet loss (written as HTL) able to automatically collect informative training triplets via an adaptively-learned hierarchical class structure that encodes global context in an elegant manner. The proposed HTL outperforms the standard triplet loss substantially by 1%-18%, and achieves new state-of-art performance on In-Shop Clothes Retrieval, Caltech-UCSD Birds 200-2011, Cars 196, and Stanford Online Products. To target the limit of deep neural networks in recognizing pixels from image level labels, we propose a novel weakly supervised curriculum learning pipeline for multi-label object recognition, detection, and semantic segmentation. It is called multi-evidence filtering and fusion (written as MEFF). MEFF follows the divide-and-conquer strategy to solve the weakly supervised learning task with three stages — the pixel level stage, the object level stage, and the pixel level stage. With image level labels, we perform multi-label object recognition at first. Then both metric learning and density-based clustering are incorporated to filter detected object instances. To obtain a relatively clean pixel-wise probability map for every class and every training image, we propose a novel algorithm for fusing image level and object level attention maps with an object detection heat map. Experiments show that our weakly supervised pipeline achieves state-of-the-art results on MS-COCO, PASCAL VOC 2007 and PASCAL VOC 2012.
DegreeDoctor of Philosophy
SubjectImage analysis - Data processing
Artificial intelligence
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/281581

 

DC FieldValueLanguage
dc.contributor.advisorYu, Y-
dc.contributor.authorGe, Weifeng-
dc.contributor.author戈维峰-
dc.date.accessioned2020-03-18T11:32:57Z-
dc.date.available2020-03-18T11:32:57Z-
dc.date.issued2019-
dc.identifier.citationGe, W. [戈维峰]. (2019). Image understanding from imperfect data via transfer learning, metric learning, and weakly supervised learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/281581-
dc.description.abstractHuge amount of labeled data has led to a set of breakthroughs in image understanding, such as object/scene recognition, object detection, semantic segmentation, and etc. However, for most real world problems, it’s expensive to obtain high quality training data sets. Learn from imperfect data relies heavily on knowledge distillation, transfer and enhancement. Based on learning methods including transfer learning, metric learning, weakly supervised learning, and etc, in this thesis, we propose novel algorithms for three problems: classifying images with insufficient training data, learning feature embeddings for image distance calculation, and recognizing pixels from image level annotation. Given an image classification task with insufficient training data, selective joint finetuning (written as SJFT), is designed to select samples from a source data set which is rich in data annotation to learn the convolutional kernels efficiently. SJFT aims to improve the generalization ability of the learned deep features by solving two classification tasks in one deep network—one is the source learning task which contains huge amount labeled data, and the other one is target learning task which lacks of training data. Low level characteristics—histogram features generated by Gabor filters or convolutional filters, are used to select valuable samples from a large scale labeled data set. Experimental results indicate adopting the source-target joint training strategy learns stronger discriminative features on Caltech 256, MIT Indoor 67, and fine-grained classification problems (Oxford FLowers 102 and Stanford Dogs 120). Deep metric learning or similarity learning Learns a distance function over images with convolutional neural networks. The large scale sampling space and the risk of local optima makes it challenging to design deep metric learning loss functions. In this work, we propose a novel hierarchical triplet loss (written as HTL) able to automatically collect informative training triplets via an adaptively-learned hierarchical class structure that encodes global context in an elegant manner. The proposed HTL outperforms the standard triplet loss substantially by 1%-18%, and achieves new state-of-art performance on In-Shop Clothes Retrieval, Caltech-UCSD Birds 200-2011, Cars 196, and Stanford Online Products. To target the limit of deep neural networks in recognizing pixels from image level labels, we propose a novel weakly supervised curriculum learning pipeline for multi-label object recognition, detection, and semantic segmentation. It is called multi-evidence filtering and fusion (written as MEFF). MEFF follows the divide-and-conquer strategy to solve the weakly supervised learning task with three stages — the pixel level stage, the object level stage, and the pixel level stage. With image level labels, we perform multi-label object recognition at first. Then both metric learning and density-based clustering are incorporated to filter detected object instances. To obtain a relatively clean pixel-wise probability map for every class and every training image, we propose a novel algorithm for fusing image level and object level attention maps with an object detection heat map. Experiments show that our weakly supervised pipeline achieves state-of-the-art results on MS-COCO, PASCAL VOC 2007 and PASCAL VOC 2012.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshImage analysis - Data processing-
dc.subject.lcshArtificial intelligence-
dc.titleImage understanding from imperfect data via transfer learning, metric learning, and weakly supervised learning-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_991044214995203414-
dc.date.hkucongregation2020-
dc.identifier.mmsid991044214995203414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats