On the importance of network architecture in training very deep neural networks

Chi, Zhizhen; Li, Hongyang; Wang, Jingjing; Lu, Huchuan

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/ICSPCC.2016.7753635
Scopus: eid_2-s2.0-85006915140

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Conference papers

Conference Paper: On the importance of network architecture in training very deep neural networks

Title	On the importance of network architecture in training very deep neural networks
Authors	Chi, Zhizhen Li, Hongyang Wang, Jingjing Lu, Huchuan
Issue Date	2016
Citation	ICSPCC 2016 - IEEE International Conference on Signal Processing, Communications and Computing, Conference Proceedings, 2016, article no. 7753635 How to Cite? DOI: http://dx.doi.org/10.1109/ICSPCC.2016.7753635
Abstract	Very deep neural networks with hundreds or more layers have achieved significant success in a variety of vision tasks spanning from image classification, detection, to image captioning. However, simply stacking more layers in the convolution operation could suffer from the gradient vanishing problem and thus could not lower down the training loss further. The residual network [1] pushes the model's depth to extremely deep by proposing an identity mapping plus a residual learning term and addresses the gradient back-propagation bottleneck well. In this paper, we investigate the residual module in great extent by analyzing the structure ordering of different blocks and modify them one by one to achieve lower test error on CIFAR-10 dataset. One key observation is that removing the original ReLU activation could facilitate the gradient propagation in the identity mapping path. Moreover, inspired by the ResNet block, we propose a random-jump scheme to skip some residual connections during training, i.e., lower features could jump to any subsequent layers and bypass its transformations directly to the higher level. Such an upgrade to the network structure not only saves training time but also obtains better performance.
Persistent Identifier	http://hdl.handle.net/10722/351372

DC Field	Value	Language
dc.contributor.author	Chi, Zhizhen	-
dc.contributor.author	Li, Hongyang	-
dc.contributor.author	Wang, Jingjing	-
dc.contributor.author	Lu, Huchuan	-
dc.date.accessioned	2024-11-20T03:55:53Z	-
dc.date.available	2024-11-20T03:55:53Z	-
dc.date.issued	2016	-
dc.identifier.citation	ICSPCC 2016 - IEEE International Conference on Signal Processing, Communications and Computing, Conference Proceedings, 2016, article no. 7753635	-
dc.identifier.uri	http://hdl.handle.net/10722/351372	-
dc.description.abstract	Very deep neural networks with hundreds or more layers have achieved significant success in a variety of vision tasks spanning from image classification, detection, to image captioning. However, simply stacking more layers in the convolution operation could suffer from the gradient vanishing problem and thus could not lower down the training loss further. The residual network [1] pushes the model's depth to extremely deep by proposing an identity mapping plus a residual learning term and addresses the gradient back-propagation bottleneck well. In this paper, we investigate the residual module in great extent by analyzing the structure ordering of different blocks and modify them one by one to achieve lower test error on CIFAR-10 dataset. One key observation is that removing the original ReLU activation could facilitate the gradient propagation in the identity mapping path. Moreover, inspired by the ResNet block, we propose a random-jump scheme to skip some residual connections during training, i.e., lower features could jump to any subsequent layers and bypass its transformations directly to the higher level. Such an upgrade to the network structure not only saves training time but also obtains better performance.	-
dc.language	eng	-
dc.relation.ispartof	ICSPCC 2016 - IEEE International Conference on Signal Processing, Communications and Computing, Conference Proceedings	-
dc.title	On the importance of network architecture in training very deep neural networks	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/ICSPCC.2016.7753635	-
dc.identifier.scopus	eid_2-s2.0-85006915140	-
dc.identifier.spage	article no. 7753635	-
dc.identifier.epage	article no. 7753635	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: On the importance of network architecture in training very deep neural networks

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats