MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

SHI, M; Aberman, K; Aristidou, A; Komura, T; Lischinski, D; Cohen-Or, D; Chen, B

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3407659
WOS: WOS:000604780700001
Find via

Supplementary

Citations:
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

Title	MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency
Authors	SHI, M Aberman, K Aristidou, A Komura, T Lischinski, D Cohen-Or, D Chen, B
Issue Date	2021
Publisher	Association for Computing Machinery, Inc. The Journal's web site is located at http://tog.acm.org
Citation	ACM Transactions on Graphics, 2021, v. 40 n. 1, p. 1-15 How to Cite? DOI: http://dx.doi.org/10.1145/3407659
Abstract	We introduce MotioNet, a deep neural network that directly reconstructs the motion of a 3D human skeleton from a monocular video. While previous methods rely on either rigging or inverse kinematics (IK) to associate a consistent skeleton with temporally coherent joint rotations, our method is the first data-driven approach that directly outputs a kinematic skeleton, which is a complete, commonly used motion representation. At the crux of our approach lies a deep neural network with embedded kinematic priors, which decomposes sequences of 2D joint positions into two separate attributes: a single, symmetric skeleton encoded by bone lengths, and a sequence of 3D joint rotations associated with global root positions and foot contact labels. These attributes are fed into an integrated forward kinematics (FK) layer that outputs 3D positions, which are compared to a ground truth. In addition, an adversarial loss is applied to the velocities of the recovered rotations to ensure that they lie on the manifold of natural joint rotations. The key advantage of our approach is that it learns to infer natural joint rotations directly from the training data rather than assuming an underlying model, or inferring them from joint positions using a data-agnostic IK solver. We show that enforcing a single consistent skeleton along with temporally coherent joint rotations constrains the solution space, leading to a more robust handling of self-occlusions and depth ambiguities.
Persistent Identifier	http://hdl.handle.net/10722/304083
ISSN	0730-0301 2023 Impact Factor: 7.8 2023 SCImago Journal Rankings: 7.766
ISI Accession Number ID	WOS:000604780700001

DC Field	Value	Language
dc.contributor.author	SHI, M	-
dc.contributor.author	Aberman, K	-
dc.contributor.author	Aristidou, A	-
dc.contributor.author	Komura, T	-
dc.contributor.author	Lischinski, D	-
dc.contributor.author	Cohen-Or, D	-
dc.contributor.author	Chen, B	-
dc.date.accessioned	2021-09-23T08:55:00Z	-
dc.date.available	2021-09-23T08:55:00Z	-
dc.date.issued	2021	-
dc.identifier.citation	ACM Transactions on Graphics, 2021, v. 40 n. 1, p. 1-15	-
dc.identifier.issn	0730-0301	-
dc.identifier.uri	http://hdl.handle.net/10722/304083	-
dc.description.abstract	We introduce MotioNet, a deep neural network that directly reconstructs the motion of a 3D human skeleton from a monocular video. While previous methods rely on either rigging or inverse kinematics (IK) to associate a consistent skeleton with temporally coherent joint rotations, our method is the first data-driven approach that directly outputs a kinematic skeleton, which is a complete, commonly used motion representation. At the crux of our approach lies a deep neural network with embedded kinematic priors, which decomposes sequences of 2D joint positions into two separate attributes: a single, symmetric skeleton encoded by bone lengths, and a sequence of 3D joint rotations associated with global root positions and foot contact labels. These attributes are fed into an integrated forward kinematics (FK) layer that outputs 3D positions, which are compared to a ground truth. In addition, an adversarial loss is applied to the velocities of the recovered rotations to ensure that they lie on the manifold of natural joint rotations. The key advantage of our approach is that it learns to infer natural joint rotations directly from the training data rather than assuming an underlying model, or inferring them from joint positions using a data-agnostic IK solver. We show that enforcing a single consistent skeleton along with temporally coherent joint rotations constrains the solution space, leading to a more robust handling of self-occlusions and depth ambiguities.	-
dc.language	eng	-
dc.publisher	Association for Computing Machinery, Inc. The Journal's web site is located at http://tog.acm.org	-
dc.relation.ispartof	ACM Transactions on Graphics	-
dc.rights	ACM Transactions on Graphics. Copyright © Association for Computing Machinery, Inc.	-
dc.rights	©ACM, YYYY. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in PUBLICATION, {VOL#, ISS#, (DATE)} http://doi.acm.org/10.1145/nnnnnn.nnnnnn	-
dc.title	MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency	-
dc.type	Article	-
dc.identifier.email	Komura, T: taku@cs.hku.hk	-
dc.identifier.authority	Komura, T=rp02741	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1145/3407659	-
dc.identifier.hkuros	325510	-
dc.identifier.volume	40	-
dc.identifier.issue	1	-
dc.identifier.spage	1	-
dc.identifier.epage	15	-
dc.identifier.isi	WOS:000604780700001	-
dc.publisher.place	United States	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats