Do we really need more training data for object localization

Li, Hongyang; Liu, Yu; Zhang, Xin; An, Zhecheng; Wang, Jingjing; Chen, Yibo; Tong, Jihong

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/ICIP.2017.8296386
Scopus: eid_2-s2.0-85045305760
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Conference papers

Conference Paper: Do we really need more training data for object localization

Title	Do we really need more training data for object localization
Authors	Li, Hongyang Liu, Yu Zhang, Xin An, Zhecheng Wang, Jingjing Chen, Yibo Tong, Jihong
Keywords	Computer vision Deep learning Image recognition Object localization
Issue Date	2017
Citation	Proceedings - International Conference on Image Processing, ICIP, 2017, v. 2017-September, p. 775-779 How to Cite? DOI: http://dx.doi.org/10.1109/ICIP.2017.8296386
Abstract	The key factor for training a good neural network lies in both model capacity and large-scale training data. As more datasets are available nowadays, one may wonder whether the success of deep learning descends from data augmentation only. In this paper, we propose a new dataset, namely, Extended ImageNet Classification (EIC) dataset based on the original ILSVRC CLS 2012 set to investigate if more training data is a crucial step. We address the problem of object localization where given an image, some boxes (also called anchors) are generated to localize multiple instances. Different from previous work to place all anchors at the last layer, we split boxes of different sizes at various resolutions in the network, since small anchors are more prone to be identified at larger spatial location in the shallow layers. Inspired by the hourglass work, we apply a conv-deconv network architecture to generate object proposals. The motivation is to fully leverage high-level summarized semantics and to utilize their up-sampling version to help guide local details in the low-level maps. Experimental results demonstrate the effectiveness of such a design. Based on the newly proposed dataset, we find more data could enhance the average recall, but a more balanced data distribution among categories could obtain better results at the cost of fewer training samples.
Persistent Identifier	http://hdl.handle.net/10722/351380
ISSN	1522-4880 2020 SCImago Journal Rankings: 0.315

DC Field	Value	Language
dc.contributor.author	Li, Hongyang	-
dc.contributor.author	Liu, Yu	-
dc.contributor.author	Zhang, Xin	-
dc.contributor.author	An, Zhecheng	-
dc.contributor.author	Wang, Jingjing	-
dc.contributor.author	Chen, Yibo	-
dc.contributor.author	Tong, Jihong	-
dc.date.accessioned	2024-11-20T03:55:56Z	-
dc.date.available	2024-11-20T03:55:56Z	-
dc.date.issued	2017	-
dc.identifier.citation	Proceedings - International Conference on Image Processing, ICIP, 2017, v. 2017-September, p. 775-779	-
dc.identifier.issn	1522-4880	-
dc.identifier.uri	http://hdl.handle.net/10722/351380	-
dc.description.abstract	The key factor for training a good neural network lies in both model capacity and large-scale training data. As more datasets are available nowadays, one may wonder whether the success of deep learning descends from data augmentation only. In this paper, we propose a new dataset, namely, Extended ImageNet Classification (EIC) dataset based on the original ILSVRC CLS 2012 set to investigate if more training data is a crucial step. We address the problem of object localization where given an image, some boxes (also called anchors) are generated to localize multiple instances. Different from previous work to place all anchors at the last layer, we split boxes of different sizes at various resolutions in the network, since small anchors are more prone to be identified at larger spatial location in the shallow layers. Inspired by the hourglass work, we apply a conv-deconv network architecture to generate object proposals. The motivation is to fully leverage high-level summarized semantics and to utilize their up-sampling version to help guide local details in the low-level maps. Experimental results demonstrate the effectiveness of such a design. Based on the newly proposed dataset, we find more data could enhance the average recall, but a more balanced data distribution among categories could obtain better results at the cost of fewer training samples.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings - International Conference on Image Processing, ICIP	-
dc.subject	Computer vision	-
dc.subject	Deep learning	-
dc.subject	Image recognition	-
dc.subject	Object localization	-
dc.title	Do we really need more training data for object localization	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/ICIP.2017.8296386	-
dc.identifier.scopus	eid_2-s2.0-85045305760	-
dc.identifier.volume	2017-September	-
dc.identifier.spage	775	-
dc.identifier.epage	779	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Do we really need more training data for object localization

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats