Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning

Xie, Tianbao; Zhao, Siheng; Wu, Chen Henry; Liu, Yitao; Luo, Qian; Zhong, Victor; Yang, Yanchao; Yu, Tao

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning

Title	Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning
Authors	Xie, Tianbao Zhao, Siheng Wu, Chen Henry Liu, Yitao Luo, Qian Zhong, Victor Yang, Yanchao Yu, Tao
Issue Date	10-May-2024
Abstract	We introduce Lemur and Lemur-Chat, openly accessible language models optimizedfor both natural language and coding capabilities to serve as the backboneof versatile language agents. The evolution from language chat models tofunctional language agents demands that models not only master human interaction,reasoning, and planning but also ensure grounding in the relevant environments.This calls for a harmonious blend of language and coding capabilitiesin the models. Lemur and Lemur-Chat are proposed to address this necessity,demonstrating balanced proficiencies in both domains, unlike existingopen-source models that tend to specialize in either. Through meticulous pretrainingusing a code-intensive corpus and instruction fine-tuning on text and codedata, our models achieve state-of-the-art averaged performance across diversetext and coding benchmarks. Comprehensive experiments demonstrate Lemur’ssuperiority over existing open-source models and its proficiency across variousagent tasks involving human communication, tool usage, and interaction underfully- and partially- observable environments. The harmonization between naturaland programming languages enables Lemur-Chat to significantly narrow thegap with proprietary models on agent abilities, providing key insights into developingadvanced open-source agents adept at reasoning, planning, and operatingseamlessly across environments.
Persistent Identifier	http://hdl.handle.net/10722/354461

DC Field	Value	Language
dc.contributor.author	Xie, Tianbao	-
dc.contributor.author	Zhao, Siheng	-
dc.contributor.author	Wu, Chen Henry	-
dc.contributor.author	Liu, Yitao	-
dc.contributor.author	Luo, Qian	-
dc.contributor.author	Zhong, Victor	-
dc.contributor.author	Yang, Yanchao	-
dc.contributor.author	Yu, Tao	-
dc.date.accessioned	2025-02-08T00:51:33Z	-
dc.date.available	2025-02-08T00:51:33Z	-
dc.date.issued	2024-05-10	-
dc.identifier.uri	http://hdl.handle.net/10722/354461	-
dc.description.abstract	<p>We introduce Lemur and Lemur-Chat, openly accessible language models optimizedfor both natural language and coding capabilities to serve as the backboneof versatile language agents. The evolution from language chat models tofunctional language agents demands that models not only master human interaction,reasoning, and planning but also ensure grounding in the relevant environments.This calls for a harmonious blend of language and coding capabilitiesin the models. Lemur and Lemur-Chat are proposed to address this necessity,demonstrating balanced proficiencies in both domains, unlike existingopen-source models that tend to specialize in either. Through meticulous pretrainingusing a code-intensive corpus and instruction fine-tuning on text and codedata, our models achieve state-of-the-art averaged performance across diversetext and coding benchmarks. Comprehensive experiments demonstrate Lemur’ssuperiority over existing open-source models and its proficiency across variousagent tasks involving human communication, tool usage, and interaction underfully- and partially- observable environments. The harmonization between naturaland programming languages enables Lemur-Chat to significantly narrow thegap with proprietary models on agent abilities, providing key insights into developingadvanced open-source agents adept at reasoning, planning, and operatingseamlessly across environments.<br></p>	-
dc.language	eng	-
dc.relation.ispartof	International Conference on Learning Representations (ICLR), 2024 (07/05/2024-11/05/2024, Vienna, Austria)	-
dc.title	Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning	-
dc.type	Conference_Paper	-

File Download

Supplementary

Conference Paper: Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats