DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

Chen, Xianing; Cao, Qiong; Zhong, Yujie; Zhang, Jing; Gao, Shenghua; Tao, Dacheng

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/CVPR52688.2022.01174
Scopus: eid_2-s2.0-85134872958
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

Title	DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
Authors	Chen, Xianing Cao, Qiong Zhong, Yujie Zhang, Jing Gao, Shenghua Tao, Dacheng
Keywords	Deep learning architectures and techniques Optimization methods
Issue Date	2022
Citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, v. 2022-June, p. 12042-12052 How to Cite? DOI: http://dx.doi.org/10.1109/CVPR52688.2022.01174
Abstract	Transformers are successfully applied to computer vision due to their powerful modeling capacity with self-attention. However, the excellent performance of transformers heavily depends on enormous training images. Thus, a data-efficient transformer solution is urgently needed. In this work, we propose an early knowledge distillation framework, which is termed as DearKD, to improve the data efficiency required by transformers. Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation. Further, our DearKD can be readily applied to the extreme data-free case where no real images are available. In this case, we propose a boundary-preserving intra-divergence loss based on DeepInversion to further close the performance gap against the full-data counterpart. Extensive experiments on ImageNet, partial ImageNet, data-free setting and other downstream tasks prove the superiority of DearKD over its baselines and state-of-the-art methods.
Persistent Identifier	http://hdl.handle.net/10722/345266
ISSN	1063-6919 2023 SCImago Journal Rankings: 10.331

DC Field	Value	Language
dc.contributor.author	Chen, Xianing	-
dc.contributor.author	Cao, Qiong	-
dc.contributor.author	Zhong, Yujie	-
dc.contributor.author	Zhang, Jing	-
dc.contributor.author	Gao, Shenghua	-
dc.contributor.author	Tao, Dacheng	-
dc.date.accessioned	2024-08-15T09:26:16Z	-
dc.date.available	2024-08-15T09:26:16Z	-
dc.date.issued	2022	-
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, v. 2022-June, p. 12042-12052	-
dc.identifier.issn	1063-6919	-
dc.identifier.uri	http://hdl.handle.net/10722/345266	-
dc.description.abstract	Transformers are successfully applied to computer vision due to their powerful modeling capacity with self-attention. However, the excellent performance of transformers heavily depends on enormous training images. Thus, a data-efficient transformer solution is urgently needed. In this work, we propose an early knowledge distillation framework, which is termed as DearKD, to improve the data efficiency required by transformers. Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation. Further, our DearKD can be readily applied to the extreme data-free case where no real images are available. In this case, we propose a boundary-preserving intra-divergence loss based on DeepInversion to further close the performance gap against the full-data counterpart. Extensive experiments on ImageNet, partial ImageNet, data-free setting and other downstream tasks prove the superiority of DearKD over its baselines and state-of-the-art methods.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition	-
dc.subject	Deep learning architectures and techniques	-
dc.subject	Optimization methods	-
dc.title	DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/CVPR52688.2022.01174	-
dc.identifier.scopus	eid_2-s2.0-85134872958	-
dc.identifier.volume	2022-June	-
dc.identifier.spage	12042	-
dc.identifier.epage	12052	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats