Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-Resolution Information in Temporal Domain

Su, Rui; Xu, Dong; Zhou, Luping; Ouyang, Wanli

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TIP.2021.3089355
Scopus: eid_2-s2.0-85111735059
PMID: 34166188
WOS: WOS:000679941200004
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-Resolution Information in Temporal Domain

Title	Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-Resolution Information in Temporal Domain
Authors	Su, Rui Xu, Dong Zhou, Luping Ouyang, Wanli
Keywords	temporal multi-resolution information two stream fusion Weakly supervised temporal action localization
Issue Date	2021
Citation	IEEE Transactions on Image Processing, 2021, v. 30, p. 6659-6672 How to Cite? DOI: http://dx.doi.org/10.1109/TIP.2021.3089355
Abstract	Weakly supervised temporal action localization is a challenging task as only the video-level annotation is available during the training process. To address this problem, we propose a two-stage approach to generate high-quality frame-level pseudo labels by fully exploiting multi-resolution information in the temporal domain and complementary information between the appearance (i.e., RGB) and motion (i.e., optical flow) streams. In the first stage, we propose an Initial Label Generation (ILG) module to generate reliable initial frame-level pseudo labels. Specifically, in this newly proposed module, we exploit temporal multi-resolution consistency and cross-stream consistency to generate high quality class activation sequences (CASs), which consist of a number of sequences with each sequence measuring how likely each video frame belongs to one specific action class. In the second stage, we propose a Progressive Temporal Label Refinement (PTLR) framework to iteratively refine the pseudo labels, in which we use a set of selected frames with highly confident pseudo labels to progressively train two networks and better predict action class scores at each frame. Specifically, in our newly proposed PTLR framework, two networks called Network-OTS and Network-RTS, which are respectively used to generate CASs for the original temporal scale and the reduced temporal scales, are used as two streams (i.e., the OTS stream and the RTS stream) to refine the pseudo labels in turn. By this way, multi-resolution information in the temporal domain is exchanged at the pseudo label level, and our work can help improve each network/stream by exploiting the refined pseudo labels from another network/stream. Comprehensive experiments on two benchmark datasets THUMOS14 and ActivityNet v1.3 demonstrate the effectiveness of our newly proposed method for weakly supervised temporal action localization.
Persistent Identifier	http://hdl.handle.net/10722/321949
ISSN	1057-7149 2023 Impact Factor: 10.8 2023 SCImago Journal Rankings: 3.556
ISI Accession Number ID	WOS:000679941200004

DC Field	Value	Language
dc.contributor.author	Su, Rui	-
dc.contributor.author	Xu, Dong	-
dc.contributor.author	Zhou, Luping	-
dc.contributor.author	Ouyang, Wanli	-
dc.date.accessioned	2022-11-03T02:22:34Z	-
dc.date.available	2022-11-03T02:22:34Z	-
dc.date.issued	2021	-
dc.identifier.citation	IEEE Transactions on Image Processing, 2021, v. 30, p. 6659-6672	-
dc.identifier.issn	1057-7149	-
dc.identifier.uri	http://hdl.handle.net/10722/321949	-
dc.description.abstract	Weakly supervised temporal action localization is a challenging task as only the video-level annotation is available during the training process. To address this problem, we propose a two-stage approach to generate high-quality frame-level pseudo labels by fully exploiting multi-resolution information in the temporal domain and complementary information between the appearance (i.e., RGB) and motion (i.e., optical flow) streams. In the first stage, we propose an Initial Label Generation (ILG) module to generate reliable initial frame-level pseudo labels. Specifically, in this newly proposed module, we exploit temporal multi-resolution consistency and cross-stream consistency to generate high quality class activation sequences (CASs), which consist of a number of sequences with each sequence measuring how likely each video frame belongs to one specific action class. In the second stage, we propose a Progressive Temporal Label Refinement (PTLR) framework to iteratively refine the pseudo labels, in which we use a set of selected frames with highly confident pseudo labels to progressively train two networks and better predict action class scores at each frame. Specifically, in our newly proposed PTLR framework, two networks called Network-OTS and Network-RTS, which are respectively used to generate CASs for the original temporal scale and the reduced temporal scales, are used as two streams (i.e., the OTS stream and the RTS stream) to refine the pseudo labels in turn. By this way, multi-resolution information in the temporal domain is exchanged at the pseudo label level, and our work can help improve each network/stream by exploiting the refined pseudo labels from another network/stream. Comprehensive experiments on two benchmark datasets THUMOS14 and ActivityNet v1.3 demonstrate the effectiveness of our newly proposed method for weakly supervised temporal action localization.	-
dc.language	eng	-
dc.relation.ispartof	IEEE Transactions on Image Processing	-
dc.subject	temporal multi-resolution information	-
dc.subject	two stream fusion	-
dc.subject	Weakly supervised temporal action localization	-
dc.title	Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-Resolution Information in Temporal Domain	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/TIP.2021.3089355	-
dc.identifier.pmid	34166188	-
dc.identifier.scopus	eid_2-s2.0-85111735059	-
dc.identifier.volume	30	-
dc.identifier.spage	6659	-
dc.identifier.epage	6672	-
dc.identifier.eissn	1941-0042	-
dc.identifier.isi	WOS:000679941200004	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-Resolution Information in Temporal Domain

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats