Progressive Cross-Stream Cooperation in Spatial and Temporal Domain for Action Localization

Su, Rui; Xu, Dong; Zhou, Luping; Ouyang, Wanli

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TPAMI.2020.2997860
Scopus: eid_2-s2.0-85118604122
PMID: 32750775
WOS: WOS:000714203900024
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

See more details

Article: Progressive Cross-Stream Cooperation in Spatial and Temporal Domain for Action Localization

Title	Progressive Cross-Stream Cooperation in Spatial and Temporal Domain for Action Localization
Authors	Su, Rui Xu, Dong Zhou, Luping Ouyang, Wanli
Keywords	Action localization spatio-temporal action localization two-stream cooperation
Issue Date	2021
Citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, v. 43, n. 12, p. 4477-4490 How to Cite? DOI: http://dx.doi.org/10.1109/TPAMI.2020.2997860
Abstract	Spatio-temporal action localization consists of three levels of tasks: spatial localization, action classification, and temporal localization. In this work, we propose a new progressive cross-stream cooperation (PCSC) framework that improves all three tasks above. The basic idea is to utilize both spatial region (resp., temporal segment proposals) and features from one stream (i.e., the Flow/RGB stream) to help another stream (i.e., the RGB/Flow stream) to iteratively generate better bounding boxes in the spatial domain (resp., temporal segments in the temporal domain). In this way, not only the actions could be more accurately localized both spatially and temporally, but also the action classes could be predicted more precisely. Specifically, we first combine the latest region proposals (for spatial detection) or segment proposals (for temporal localization) from both streams to form a larger set of labelled training samples to help learn better action detection or segment detection models. Second, to learn better representations, we also propose a new message passing approach to pass information from one stream to another stream, which also leads to better action detection and segment detection models. By first using our newly proposed PCSC framework for spatial localization at the frame-level and then applying our temporal PCSC framework for temporal localization at the tube-level, the action localization results are progressively improved at both the frame level and the video level. Comprehensive experiments on two benchmark datasets UCF-101-24 and J-HMDB demonstrate the effectiveness of our newly proposed approaches for spatio-temporal action localization in realistic scenarios.
Persistent Identifier	http://hdl.handle.net/10722/322061
ISSN	0162-8828 2023 Impact Factor: 20.8 2023 SCImago Journal Rankings: 6.158
ISI Accession Number ID	WOS:000714203900024

DC Field	Value	Language
dc.contributor.author	Su, Rui	-
dc.contributor.author	Xu, Dong	-
dc.contributor.author	Zhou, Luping	-
dc.contributor.author	Ouyang, Wanli	-
dc.date.accessioned	2022-11-03T02:23:20Z	-
dc.date.available	2022-11-03T02:23:20Z	-
dc.date.issued	2021	-
dc.identifier.citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, v. 43, n. 12, p. 4477-4490	-
dc.identifier.issn	0162-8828	-
dc.identifier.uri	http://hdl.handle.net/10722/322061	-
dc.description.abstract	Spatio-temporal action localization consists of three levels of tasks: spatial localization, action classification, and temporal localization. In this work, we propose a new progressive cross-stream cooperation (PCSC) framework that improves all three tasks above. The basic idea is to utilize both spatial region (resp., temporal segment proposals) and features from one stream (i.e., the Flow/RGB stream) to help another stream (i.e., the RGB/Flow stream) to iteratively generate better bounding boxes in the spatial domain (resp., temporal segments in the temporal domain). In this way, not only the actions could be more accurately localized both spatially and temporally, but also the action classes could be predicted more precisely. Specifically, we first combine the latest region proposals (for spatial detection) or segment proposals (for temporal localization) from both streams to form a larger set of labelled training samples to help learn better action detection or segment detection models. Second, to learn better representations, we also propose a new message passing approach to pass information from one stream to another stream, which also leads to better action detection and segment detection models. By first using our newly proposed PCSC framework for spatial localization at the frame-level and then applying our temporal PCSC framework for temporal localization at the tube-level, the action localization results are progressively improved at both the frame level and the video level. Comprehensive experiments on two benchmark datasets UCF-101-24 and J-HMDB demonstrate the effectiveness of our newly proposed approaches for spatio-temporal action localization in realistic scenarios.	-
dc.language	eng	-
dc.relation.ispartof	IEEE Transactions on Pattern Analysis and Machine Intelligence	-
dc.subject	Action localization	-
dc.subject	spatio-temporal action localization	-
dc.subject	two-stream cooperation	-
dc.title	Progressive Cross-Stream Cooperation in Spatial and Temporal Domain for Action Localization	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/TPAMI.2020.2997860	-
dc.identifier.pmid	32750775	-
dc.identifier.scopus	eid_2-s2.0-85118604122	-
dc.identifier.volume	43	-
dc.identifier.issue	12	-
dc.identifier.spage	4477	-
dc.identifier.epage	4490	-
dc.identifier.eissn	1939-3539	-
dc.identifier.isi	WOS:000714203900024	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Progressive Cross-Stream Cooperation in Spatial and Temporal Domain for Action Localization

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats