Progressively parsing interactional objects for fine grained action detection

Ni, Bingbing; Yang, Xiaokang; Gao, Shenghua

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/CVPR.2016.116
Scopus: eid_2-s2.0-84986331364
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Progressively parsing interactional objects for fine grained action detection

Title	Progressively parsing interactional objects for fine grained action detection
Authors	Ni, Bingbing Yang, Xiaokang Gao, Shenghua
Issue Date	2016
Citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, v. 2016-December, p. 1020-1028 How to Cite? DOI: http://dx.doi.org/10.1109/CVPR.2016.116
Abstract	Fine grained video action analysis often requires reliable detection and tracking of various interacting objects and human body parts, denoted as Interactional Object Parsing. However, most of the previous methods based on either independent or joint object detection might suffer from high model complexity and challenging image content, e.g., illumination/pose/appearance/scale variation, motion, and occlusion etc. In this work, we propose an end-to-end system based on recurrent neural network to perform frame by frame interactional object parsing, which can alleviate the difficulty through an incremental/progressive manner. Our key innovation is that: instead of jointly outputting all object detections at once, for each frame we use a set of long-short term memory (LSTM) nodes to incrementally refine the detections. After passing through each LSTM node, more object detections are consolidated and thus more contextual information could be utilized to localize more difficult objects. The object parsing results are further utilized to form object specific action representation for fine grained action detection. Extensive experiments on two benchmark fine grained activity datasets demonstrate that our proposed algorithm achieves better interacting object detection performance, which in turn boosts the action recognition performance over the state-of-the-art.
Persistent Identifier	http://hdl.handle.net/10722/345220
ISSN	1063-6919 2023 SCImago Journal Rankings: 10.331

DC Field	Value	Language
dc.contributor.author	Ni, Bingbing	-
dc.contributor.author	Yang, Xiaokang	-
dc.contributor.author	Gao, Shenghua	-
dc.date.accessioned	2024-08-15T09:25:59Z	-
dc.date.available	2024-08-15T09:25:59Z	-
dc.date.issued	2016	-
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, v. 2016-December, p. 1020-1028	-
dc.identifier.issn	1063-6919	-
dc.identifier.uri	http://hdl.handle.net/10722/345220	-
dc.description.abstract	Fine grained video action analysis often requires reliable detection and tracking of various interacting objects and human body parts, denoted as Interactional Object Parsing. However, most of the previous methods based on either independent or joint object detection might suffer from high model complexity and challenging image content, e.g., illumination/pose/appearance/scale variation, motion, and occlusion etc. In this work, we propose an end-to-end system based on recurrent neural network to perform frame by frame interactional object parsing, which can alleviate the difficulty through an incremental/progressive manner. Our key innovation is that: instead of jointly outputting all object detections at once, for each frame we use a set of long-short term memory (LSTM) nodes to incrementally refine the detections. After passing through each LSTM node, more object detections are consolidated and thus more contextual information could be utilized to localize more difficult objects. The object parsing results are further utilized to form object specific action representation for fine grained action detection. Extensive experiments on two benchmark fine grained activity datasets demonstrate that our proposed algorithm achieves better interacting object detection performance, which in turn boosts the action recognition performance over the state-of-the-art.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition	-
dc.title	Progressively parsing interactional objects for fine grained action detection	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/CVPR.2016.116	-
dc.identifier.scopus	eid_2-s2.0-84986331364	-
dc.identifier.volume	2016-December	-
dc.identifier.spage	1020	-
dc.identifier.epage	1028	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Progressively parsing interactional objects for fine grained action detection

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats