Mix-and-match tuning for self-supervised semantic segmentation

Zhan, Xiaohang; Liu, Ziwei; Luo, Ping; Tang, Xiaoou; Loy, Chen Change

File Download

re01.htm

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-85060496730

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Mix-and-match tuning for self-supervised semantic segmentation

Title	Mix-and-match tuning for self-supervised semantic segmentation
Authors	Zhan, Xiaohang Liu, Ziwei Luo, Ping Tang, Xiaoou Loy, Chen Change
Issue Date	2018
Publisher	Association for the Advancement of Artificial Intelligence. The conference proceedings' web site is located at https://www.aaai.org/ocs/index.php/AAAI/AAAI18/index
Citation	32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 2018, p. 7534-7541 How to Cite?
Abstract	Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Deep convolutional networks for semantic image segmentation typically require large-scale labeled data, e.g., ImageNet and MS COCO, for network pre-training. To reduce annotation efforts, self-supervised semantic segmentation is recently proposed to pre-train a network without any human-provided labels. The key of this new form of learning is to design a proxy task (e.g., image colorization), from which a discriminative loss can be formulated on unlabeled data. Many proxy tasks, however, lack the critical supervision signals that could induce discriminative representation for the target image segmentation task. Thus self-supervision's performance is still far from that of supervised pre-training. In this study, we overcome this limitation by incorporating a 'mix-and-match' (M&M) tuning stage in the self-supervision pipeline. The proposed approach is readily pluggable to many self-supervision methods and does not use more annotated samples than the original process. Yet, it is capable of boosting the performance of target image segmentation task to surpass fully-supervised pre-trained counterpart. The improvement is made possible by better harnessing the limited pixel-wise annotations in the target dataset. Specifically, we first introduce the 'mix' stage, which sparsely samples and mixes patches from the target set to reflect rich and diverse local patch statistics of target images. A 'match' stage then forms a class-wise connected graph, which can be used to derive a strong triplet-based discriminative loss for fine-tuning the network. Our paradigm follows the standard practice in existing self-supervised studies and no extra data or label is required. With the proposed M&M approach, for the first time, a self-supervision method can achieve comparable or even better performance compared to its ImageNet pre-trained counterpart on both PASCAL VOC2012 dataset and CityScapes dataset.
Persistent Identifier	http://hdl.handle.net/10722/273670

DC Field	Value	Language
dc.contributor.author	Zhan, Xiaohang	-
dc.contributor.author	Liu, Ziwei	-
dc.contributor.author	Luo, Ping	-
dc.contributor.author	Tang, Xiaoou	-
dc.contributor.author	Loy, Chen Change	-
dc.date.accessioned	2019-08-12T09:56:19Z	-
dc.date.available	2019-08-12T09:56:19Z	-
dc.date.issued	2018	-
dc.identifier.citation	32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 2018, p. 7534-7541	-
dc.identifier.uri	http://hdl.handle.net/10722/273670	-
dc.description.abstract	Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Deep convolutional networks for semantic image segmentation typically require large-scale labeled data, e.g., ImageNet and MS COCO, for network pre-training. To reduce annotation efforts, self-supervised semantic segmentation is recently proposed to pre-train a network without any human-provided labels. The key of this new form of learning is to design a proxy task (e.g., image colorization), from which a discriminative loss can be formulated on unlabeled data. Many proxy tasks, however, lack the critical supervision signals that could induce discriminative representation for the target image segmentation task. Thus self-supervision's performance is still far from that of supervised pre-training. In this study, we overcome this limitation by incorporating a 'mix-and-match' (M&M) tuning stage in the self-supervision pipeline. The proposed approach is readily pluggable to many self-supervision methods and does not use more annotated samples than the original process. Yet, it is capable of boosting the performance of target image segmentation task to surpass fully-supervised pre-trained counterpart. The improvement is made possible by better harnessing the limited pixel-wise annotations in the target dataset. Specifically, we first introduce the 'mix' stage, which sparsely samples and mixes patches from the target set to reflect rich and diverse local patch statistics of target images. A 'match' stage then forms a class-wise connected graph, which can be used to derive a strong triplet-based discriminative loss for fine-tuning the network. Our paradigm follows the standard practice in existing self-supervised studies and no extra data or label is required. With the proposed M&M approach, for the first time, a self-supervision method can achieve comparable or even better performance compared to its ImageNet pre-trained counterpart on both PASCAL VOC2012 dataset and CityScapes dataset.	-
dc.language	eng	-
dc.publisher	Association for the Advancement of Artificial Intelligence. The conference proceedings' web site is located at https://www.aaai.org/ocs/index.php/AAAI/AAAI18/index	-
dc.relation.ispartof	32nd AAAI Conference on Artificial Intelligence, AAAI 2018	-
dc.title	Mix-and-match tuning for self-supervised semantic segmentation	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.scopus	eid_2-s2.0-85060496730	-
dc.identifier.spage	7534	-
dc.identifier.epage	7541	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Mix-and-match tuning for self-supervised semantic segmentation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats