File Download
Supplementary
-
Citations:
- Appears in Collections:
Conference Paper: SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation.
| Title | SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation. |
|---|---|
| Authors | |
| Issue Date | 19-Oct-2025 |
| Abstract | Simulating stylized human-scene interactions (HSI) in physical environments is a challenging yet fascinating task. Prior works emphasize long-term execution but fall short in achieving both diverse style and physical plausibility. To tackle this challenge, we introduce a novel hierarchical framework named SIMS that seamlessly bridges high level script-driven intent with a low-level control policy, enabling more expressive and diverse human-scene interactions. Specifically, we employ Large Language Models with Retrieval-Augmented Generation (RAG) to generate coherent and diverse long-form scripts, providing a rich foundation for motion planning. A versatile multicondition physics-based control policy is also developed, which leverages text embeddings from the generated scripts to encode stylistic cues, simultaneously perceiving environmental geometries and accomplishing task goals. By integrating the retrieval-augmented script generation with the multi-condition controller, our approach provides a unified solution for generating stylized HSI motions. We further introduce a comprehensive planning dataset produced by RAG and a stylized motion dataset featuring diverse locomotions and interactions. Extensive experiments demonstrate SIMS’s effectiveness in executing various tasks and generalizing across different scenarios, significantly outperforming previous methods. Project page: https://wenjiawang0312.github.io/projects/sims/. |
| Persistent Identifier | http://hdl.handle.net/10722/362730 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Wang, Wenjia | - |
| dc.contributor.author | Pan, Liang | - |
| dc.contributor.author | Dou, Zhiyang | - |
| dc.contributor.author | Mei, Jidong | - |
| dc.contributor.author | Liao, Zhouyingcheng | - |
| dc.contributor.author | Lou, Yuke | - |
| dc.contributor.author | Wu, Yifan | - |
| dc.contributor.author | Lei, Yang | - |
| dc.contributor.author | Wang, Jingbo | - |
| dc.contributor.author | Komura, Taku | - |
| dc.date.accessioned | 2025-09-27T00:35:27Z | - |
| dc.date.available | 2025-09-27T00:35:27Z | - |
| dc.date.issued | 2025-10-19 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/362730 | - |
| dc.description.abstract | <p>Simulating stylized human-scene interactions (HSI) in physical environments is a challenging yet fascinating task. Prior works emphasize long-term execution but fall short in achieving both diverse style and physical plausibility. To tackle this challenge, we introduce a novel hierarchical framework named SIMS that seamlessly bridges high level script-driven intent with a low-level control policy, enabling more expressive and diverse human-scene interactions. Specifically, we employ Large Language Models with Retrieval-Augmented Generation (RAG) to generate coherent and diverse long-form scripts, providing a rich foundation for motion planning. A versatile multicondition physics-based control policy is also developed, which leverages text embeddings from the generated scripts to encode stylistic cues, simultaneously perceiving environmental geometries and accomplishing task goals. By integrating the retrieval-augmented script generation with the multi-condition controller, our approach provides a unified solution for generating stylized HSI motions. We further introduce a comprehensive planning dataset produced by RAG and a stylized motion dataset featuring diverse locomotions and interactions. Extensive experiments demonstrate SIMS’s effectiveness in executing various tasks and generalizing across different scenarios, significantly outperforming previous methods. Project page: https://wenjiawang0312.github.io/projects/sims/.<br></p> | - |
| dc.language | eng | - |
| dc.relation.ispartof | International Conference on Computer Vision (ICCV) (19/10/2025-23/10/2025, Honolulu, Hawai'i) | - |
| dc.title | SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation. | - |
| dc.type | Conference_Paper | - |
| dc.description.nature | preprint | - |
