GeometryMotion-Transformer: An End-to-End Framework for 3D Action Recognition

Liu, Jiaheng; Guo, Jinyang; Xu, Dong

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TMM.2022.3198011
Scopus: eid_2-s2.0-85135970280
WOS: WOS:001098831500001
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: GeometryMotion-Transformer: An End-to-End Framework for 3D Action Recognition

Title	GeometryMotion-Transformer: An End-to-End Framework for 3D Action Recognition
Authors	Liu, Jiaheng Guo, Jinyang Xu, Dong
Keywords	3D action recognition Feature extraction Finite element analysis Geometry point cloud Point cloud compression Task analysis Three-dimensional displays transformer Transformers
Issue Date	2022
Citation	IEEE Transactions on Multimedia, 2022 How to Cite? DOI: http://dx.doi.org/10.1109/TMM.2022.3198011
Abstract	In this work, we propose a new end-to-end optimized two-stream framework called GeometryMotion-Transformer (GMT) for 3D action recognition. We first observe that the existing 3D action recognition approaches cannot well extract motion representations from point cloud sequences. Specifically, when extracting motion representations, the existing approaches do not explicitly consider one-to-one correspondence among frames. Besides, the existing methods only extract the single-scale motion representations, which cannot well model the complex motion patterns of moving objects in point cloud sequences. To address these issues, we first propose the feature extraction module (FEM) to generate one-to-one correspondence among frames without using the voxelization process, and explicitly extract both geometry and multi-scale motion representations from raw point clouds. Moreover, we also observe the existing two-stream 3D action recognition approaches simply concatenate or add the geometry and motion features, which cannot well exploit the relationship between two-steam features. To this end, we also propose an improved transformer-based feature fusion module (FFM) to effectively fuse the two-stream features. Based on the proposed FEM and FFM, we build our GMT for 3D action recognition. Extensive experimental results on four benchmark datasets demonstrate the effectiveness of our backbone GMT.
Persistent Identifier	http://hdl.handle.net/10722/322003
ISSN	1520-9210 2023 Impact Factor: 8.4 2023 SCImago Journal Rankings: 2.260
ISI Accession Number ID	WOS:001098831500001

DC Field	Value	Language
dc.contributor.author	Liu, Jiaheng	-
dc.contributor.author	Guo, Jinyang	-
dc.contributor.author	Xu, Dong	-
dc.date.accessioned	2022-11-03T02:22:56Z	-
dc.date.available	2022-11-03T02:22:56Z	-
dc.date.issued	2022	-
dc.identifier.citation	IEEE Transactions on Multimedia, 2022	-
dc.identifier.issn	1520-9210	-
dc.identifier.uri	http://hdl.handle.net/10722/322003	-
dc.description.abstract	In this work, we propose a new end-to-end optimized two-stream framework called GeometryMotion-Transformer (GMT) for 3D action recognition. We first observe that the existing 3D action recognition approaches cannot well extract motion representations from point cloud sequences. Specifically, when extracting motion representations, the existing approaches do not explicitly consider one-to-one correspondence among frames. Besides, the existing methods only extract the <italic>single-scale</italic> motion representations, which cannot well model the complex motion patterns of moving objects in point cloud sequences. To address these issues, we first propose the feature extraction module (FEM) to generate one-to-one correspondence among frames without using the voxelization process, and explicitly extract both geometry and <italic>multi-scale</italic> motion representations from raw point clouds. Moreover, we also observe the existing two-stream 3D action recognition approaches simply concatenate or add the geometry and motion features, which cannot well exploit the relationship between two-steam features. To this end, we also propose an improved transformer-based feature fusion module (FFM) to effectively fuse the two-stream features. Based on the proposed FEM and FFM, we build our GMT for 3D action recognition. Extensive experimental results on four benchmark datasets demonstrate the effectiveness of our backbone GMT.	-
dc.language	eng	-
dc.relation.ispartof	IEEE Transactions on Multimedia	-
dc.subject	3D action recognition	-
dc.subject	Feature extraction	-
dc.subject	Finite element analysis	-
dc.subject	Geometry	-
dc.subject	point cloud	-
dc.subject	Point cloud compression	-
dc.subject	Task analysis	-
dc.subject	Three-dimensional displays	-
dc.subject	transformer	-
dc.subject	Transformers	-
dc.title	GeometryMotion-Transformer: An End-to-End Framework for 3D Action Recognition	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/TMM.2022.3198011	-
dc.identifier.scopus	eid_2-s2.0-85135970280	-
dc.identifier.eissn	1941-0077	-
dc.identifier.isi	WOS:001098831500001	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: GeometryMotion-Transformer: An End-to-End Framework for 3D Action Recognition

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats