File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/3592456
- Scopus: eid_2-s2.0-85166345445
- WOS: WOS:001044671300009
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer
Title | BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer |
---|---|
Authors | |
Keywords | deep learning motion generation transformer |
Issue Date | 26-Jul-2023 |
Publisher | Association for Computing Machinery (ACM) |
Citation | ACM Transactions on Graphics, 2023, v. 42, n. 4 How to Cite? |
Abstract | Automatic gesture synthesis from speech is a topic that has attracted researchers for applications in remote communication, video games and Metaverse. Learning the mapping between speech and 3D full-body gestures is difficult due to the stochastic nature of the problem and the lack of a rich cross-modal dataset that is needed for training. In this paper, we propose a novel transformer-based framework for automatic 3D body gesture synthesis from speech. To learn the stochastic nature of the body gesture during speech, we propose a variational transformer to effectively model a probabilistic distribution over gestures, which can produce diverse gestures during inference. Furthermore, we introduce a mode positional embedding layer to capture the different motion speeds in different speaking modes. To cope with the scarcity of data, we design an intra-modal pre-training scheme that can learn the complex mapping between the speech and the 3D gesture from a limited amount of data. Our system is trained with either the Trinity speech-gesture dataset or the Talking With Hands 16.2M dataset. The results show that our system can produce more realistic, appropriate, and diverse body gestures compared to existing state-of-the-art approaches. |
Persistent Identifier | http://hdl.handle.net/10722/331610 |
ISSN | 2023 Impact Factor: 7.8 2023 SCImago Journal Rankings: 7.766 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Pang, Kunkun | - |
dc.contributor.author | Qin, Dafei | - |
dc.contributor.author | Fan, Yingruo | - |
dc.contributor.author | Habekost, Julian | - |
dc.contributor.author | Shiratori, Takaaki | - |
dc.contributor.author | Yamagishi, Junichi | - |
dc.contributor.author | Komura, Taku | - |
dc.date.accessioned | 2023-09-21T06:57:21Z | - |
dc.date.available | 2023-09-21T06:57:21Z | - |
dc.date.issued | 2023-07-26 | - |
dc.identifier.citation | ACM Transactions on Graphics, 2023, v. 42, n. 4 | - |
dc.identifier.issn | 0730-0301 | - |
dc.identifier.uri | http://hdl.handle.net/10722/331610 | - |
dc.description.abstract | <p> Automatic gesture synthesis from speech is a topic that has attracted researchers for applications in remote communication, video games and Metaverse. Learning the mapping between speech and 3D full-body gestures is difficult due to the stochastic nature of the problem and the lack of a rich cross-modal dataset that is needed for training. In this paper, we propose a novel transformer-based framework for automatic 3D body gesture synthesis from speech. To learn the stochastic nature of the body gesture during speech, we propose a variational transformer to effectively model a probabilistic distribution over gestures, which can produce diverse gestures during inference. Furthermore, we introduce a mode positional embedding layer to capture the different motion speeds in different speaking modes. To cope with the scarcity of data, we design an intra-modal pre-training scheme that can learn the complex mapping between the speech and the 3D gesture from a limited amount of data. Our system is trained with either the Trinity speech-gesture dataset or the Talking With Hands 16.2M dataset. The results show that our system can produce more realistic, appropriate, and diverse body gestures compared to existing state-of-the-art approaches. <br></p> | - |
dc.language | eng | - |
dc.publisher | Association for Computing Machinery (ACM) | - |
dc.relation.ispartof | ACM Transactions on Graphics | - |
dc.subject | deep learning | - |
dc.subject | motion generation | - |
dc.subject | transformer | - |
dc.title | BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer | - |
dc.type | Article | - |
dc.identifier.doi | 10.1145/3592456 | - |
dc.identifier.scopus | eid_2-s2.0-85166345445 | - |
dc.identifier.volume | 42 | - |
dc.identifier.issue | 4 | - |
dc.identifier.eissn | 1557-7368 | - |
dc.identifier.isi | WOS:001044671300009 | - |
dc.identifier.issnl | 0730-0301 | - |