TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

Hu, Huazhang; Dong, Sixun; Zhao, Yiqun; Lian, Dongze; Li, Zhengxin; Gao, Shenghua

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/CVPR52688.2022.01843
Scopus: eid_2-s2.0-85141796445
WOS: WOS:000870783004078
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

Title	TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
Authors	Hu, Huazhang Dong, Sixun Zhao, Yiqun Lian, Dongze Li, Zhengxin Gao, Shenghua
Keywords	Action and event recognition Datasets and evaluation Face and gestures Others Pose estimation and tracking
Issue Date	2022
Citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, v. 2022-June, p. 18991-19000 How to Cite? DOI: http://dx.doi.org/10.1109/CVPR52688.2022.01843
Abstract	Counting repetitive actions are widely seen in human activities such as physical exercise. Existing methods focus on performing repetitive action counting in short videos, which is tough for dealing with longer videos in more realistic scenarios. In the data-driven era, the degradation of such generalization capability is mainly attributed to the lack of long video datasets. To complement this margin, we introduce a new large-scale repetitive action counting dataset covering a wide variety of video lengths, along with more realistic situations where action interruption or action inconsistencies occur in the video. Besides, we also provide a fine-grained annotation of the action cycles instead of just counting annotation along with a numerical value. Such a dataset contains 1,451 videos with about 20,000 annotations, which is more challenging. For repetitive action counting towards more realistic scenarios, we further propose encoding multi-scale temporal correlation with transformers that can take into account both performance and efficiency. Furthermore, with the help of fine-grained annotation of action cycles, we propose a density map regression-based method to predict the action period, which yields better performance with sufficient interpretability. Our proposed method outperforms state-of-the-art methods on all datasets and also achieves better performance on the unseen dataset without fine-tuning. The dataset and code are available 11https://svip-lab.github.io/dataset/RepCount_dataset.html.
Persistent Identifier	http://hdl.handle.net/10722/345287
ISSN	1063-6919 2023 SCImago Journal Rankings: 10.331
ISI Accession Number ID	WOS:000870783004078

DC Field	Value	Language
dc.contributor.author	Hu, Huazhang	-
dc.contributor.author	Dong, Sixun	-
dc.contributor.author	Zhao, Yiqun	-
dc.contributor.author	Lian, Dongze	-
dc.contributor.author	Li, Zhengxin	-
dc.contributor.author	Gao, Shenghua	-
dc.date.accessioned	2024-08-15T09:26:24Z	-
dc.date.available	2024-08-15T09:26:24Z	-
dc.date.issued	2022	-
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, v. 2022-June, p. 18991-19000	-
dc.identifier.issn	1063-6919	-
dc.identifier.uri	http://hdl.handle.net/10722/345287	-
dc.description.abstract	Counting repetitive actions are widely seen in human activities such as physical exercise. Existing methods focus on performing repetitive action counting in short videos, which is tough for dealing with longer videos in more realistic scenarios. In the data-driven era, the degradation of such generalization capability is mainly attributed to the lack of long video datasets. To complement this margin, we introduce a new large-scale repetitive action counting dataset covering a wide variety of video lengths, along with more realistic situations where action interruption or action inconsistencies occur in the video. Besides, we also provide a fine-grained annotation of the action cycles instead of just counting annotation along with a numerical value. Such a dataset contains 1,451 videos with about 20,000 annotations, which is more challenging. For repetitive action counting towards more realistic scenarios, we further propose encoding multi-scale temporal correlation with transformers that can take into account both performance and efficiency. Furthermore, with the help of fine-grained annotation of action cycles, we propose a density map regression-based method to predict the action period, which yields better performance with sufficient interpretability. Our proposed method outperforms state-of-the-art methods on all datasets and also achieves better performance on the unseen dataset without fine-tuning. The dataset and code are available 11https://svip-lab.github.io/dataset/RepCount_dataset.html.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition	-
dc.subject	Action and event recognition	-
dc.subject	Datasets and evaluation	-
dc.subject	Face and gestures	-
dc.subject	Others	-
dc.subject	Pose estimation and tracking	-
dc.title	TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/CVPR52688.2022.01843	-
dc.identifier.scopus	eid_2-s2.0-85141796445	-
dc.identifier.volume	2022-June	-
dc.identifier.spage	18991	-
dc.identifier.epage	19000	-
dc.identifier.isi	WOS:000870783004078	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats