File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer

TitleBodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer
Authors
Keywordsdeep learning
motion generation
transformer
Issue Date26-Jul-2023
PublisherAssociation for Computing Machinery (ACM)
Citation
ACM Transactions on Graphics, 2023, v. 42, n. 4 How to Cite?
Abstract

Automatic gesture synthesis from speech is a topic that has attracted researchers for applications in remote communication, video games and Metaverse. Learning the mapping between speech and 3D full-body gestures is difficult due to the stochastic nature of the problem and the lack of a rich cross-modal dataset that is needed for training. In this paper, we propose a novel transformer-based framework for automatic 3D body gesture synthesis from speech. To learn the stochastic nature of the body gesture during speech, we propose a variational transformer to effectively model a probabilistic distribution over gestures, which can produce diverse gestures during inference. Furthermore, we introduce a mode positional embedding layer to capture the different motion speeds in different speaking modes. To cope with the scarcity of data, we design an intra-modal pre-training scheme that can learn the complex mapping between the speech and the 3D gesture from a limited amount of data. Our system is trained with either the Trinity speech-gesture dataset or the Talking With Hands 16.2M dataset. The results show that our system can produce more realistic, appropriate, and diverse body gestures compared to existing state-of-the-art approaches.


Persistent Identifierhttp://hdl.handle.net/10722/331610
ISSN
2021 Impact Factor: 7.403
2020 SCImago Journal Rankings: 2.153

 

DC FieldValueLanguage
dc.contributor.authorPang, Kunkun-
dc.contributor.authorQin, Dafei-
dc.contributor.authorFan, Yingruo-
dc.contributor.authorHabekost, Julian-
dc.contributor.authorShiratori, Takaaki-
dc.contributor.authorYamagishi, Junichi-
dc.contributor.authorKomura, Taku-
dc.date.accessioned2023-09-21T06:57:21Z-
dc.date.available2023-09-21T06:57:21Z-
dc.date.issued2023-07-26-
dc.identifier.citationACM Transactions on Graphics, 2023, v. 42, n. 4-
dc.identifier.issn0730-0301-
dc.identifier.urihttp://hdl.handle.net/10722/331610-
dc.description.abstract<p> Automatic gesture synthesis from speech is a topic that has attracted researchers for applications in remote communication, video games and Metaverse. Learning the mapping between speech and 3D full-body gestures is difficult due to the stochastic nature of the problem and the lack of a rich cross-modal dataset that is needed for training. In this paper, we propose a novel transformer-based framework for automatic 3D body gesture synthesis from speech. To learn the stochastic nature of the body gesture during speech, we propose a variational transformer to effectively model a probabilistic distribution over gestures, which can produce diverse gestures during inference. Furthermore, we introduce a mode positional embedding layer to capture the different motion speeds in different speaking modes. To cope with the scarcity of data, we design an intra-modal pre-training scheme that can learn the complex mapping between the speech and the 3D gesture from a limited amount of data. Our system is trained with either the Trinity speech-gesture dataset or the Talking With Hands 16.2M dataset. The results show that our system can produce more realistic, appropriate, and diverse body gestures compared to existing state-of-the-art approaches. <br></p>-
dc.languageeng-
dc.publisherAssociation for Computing Machinery (ACM)-
dc.relation.ispartofACM Transactions on Graphics-
dc.subjectdeep learning-
dc.subjectmotion generation-
dc.subjecttransformer-
dc.titleBodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer-
dc.typeArticle-
dc.identifier.doi10.1145/3592456-
dc.identifier.scopuseid_2-s2.0-85166345445-
dc.identifier.volume42-
dc.identifier.issue4-
dc.identifier.eissn1557-7368-
dc.identifier.issnl0730-0301-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats