Pedestrian parsing via deep decompositional network

Luo, Ping; Wang, Xiaogang; Tang, Xiaoou

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/ICCV.2013.329
Scopus: eid_2-s2.0-84898770979
WOS: WOS:000351830500331

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Pedestrian parsing via deep decompositional network

Title	Pedestrian parsing via deep decompositional network
Authors	Luo, Ping Wang, Xiaogang Tang, Xiaoou
Keywords	pedestrian parsing deep learning
Issue Date	2013
Citation	Proceedings of the IEEE International Conference on Computer Vision, 2013, p. 2648-2655 How to Cite? DOI: http://dx.doi.org/10.1109/ICCV.2013.329
Abstract	We propose a new Deep Decompositional Network (DDN) for parsing pedestrian images into semantic regions, such as hair, head, body, arms, and legs, where the pedestrians can be heavily occluded. Unlike existing methods based on template matching or Bayesian inference, our approach directly maps low-level visual features to the label maps of body parts with DDN, which is able to accurately estimate complex pose variations with good robustness to occlusions and background clutters. DDN jointly estimates occluded regions and segments body parts by stacking three types of hidden layers: occlusion estimation layers, completion layers, and decomposition layers. The occlusion estimation layers estimate a binary mask, indicating which part of a pedestrian is invisible. The completion layers synthesize low-level features of the invisible part from the original features and the occlusion mask. The decomposition layers directly transform the synthesized visual features to label maps. We devise a new strategy to pre-train these hidden layers, and then fine-tune the entire network using the stochastic gradient descent. Experimental results show that our approach achieves better segmentation accuracy than the state-of-the-art methods on pedestrian images with or without occlusions. Another important contribution of this paper is that it provides a large scale benchmark human parsing dataset that includes 3,673 annotated samples collected from 171 surveillance videos. It is 20 times larger than existing public datasets. © 2013 IEEE.
Persistent Identifier	http://hdl.handle.net/10722/273661
ISI Accession Number ID	WOS:000351830500331

DC Field	Value	Language
dc.contributor.author	Luo, Ping	-
dc.contributor.author	Wang, Xiaogang	-
dc.contributor.author	Tang, Xiaoou	-
dc.date.accessioned	2019-08-12T09:56:18Z	-
dc.date.available	2019-08-12T09:56:18Z	-
dc.date.issued	2013	-
dc.identifier.citation	Proceedings of the IEEE International Conference on Computer Vision, 2013, p. 2648-2655	-
dc.identifier.uri	http://hdl.handle.net/10722/273661	-
dc.description.abstract	We propose a new Deep Decompositional Network (DDN) for parsing pedestrian images into semantic regions, such as hair, head, body, arms, and legs, where the pedestrians can be heavily occluded. Unlike existing methods based on template matching or Bayesian inference, our approach directly maps low-level visual features to the label maps of body parts with DDN, which is able to accurately estimate complex pose variations with good robustness to occlusions and background clutters. DDN jointly estimates occluded regions and segments body parts by stacking three types of hidden layers: occlusion estimation layers, completion layers, and decomposition layers. The occlusion estimation layers estimate a binary mask, indicating which part of a pedestrian is invisible. The completion layers synthesize low-level features of the invisible part from the original features and the occlusion mask. The decomposition layers directly transform the synthesized visual features to label maps. We devise a new strategy to pre-train these hidden layers, and then fine-tune the entire network using the stochastic gradient descent. Experimental results show that our approach achieves better segmentation accuracy than the state-of-the-art methods on pedestrian images with or without occlusions. Another important contribution of this paper is that it provides a large scale benchmark human parsing dataset that includes 3,673 annotated samples collected from 171 surveillance videos. It is 20 times larger than existing public datasets. © 2013 IEEE.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the IEEE International Conference on Computer Vision	-
dc.subject	pedestrian parsing	-
dc.subject	deep learning	-
dc.title	Pedestrian parsing via deep decompositional network	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/ICCV.2013.329	-
dc.identifier.scopus	eid_2-s2.0-84898770979	-
dc.identifier.spage	2648	-
dc.identifier.epage	2655	-
dc.identifier.isi	WOS:000351830500331	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Pedestrian parsing via deep decompositional network

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats