Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning

Zhai, Yuexiang; Baek, Christina; Zhou, Zhengyuan; Jiao, Jiantao; Ma, Yi

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1613/JAIR.1.13326
Scopus: eid_2-s2.0-85128186467
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Journal/Magazine Articles

Article: Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning

Title	Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning
Authors	Zhai, Yuexiang Baek, Christina Zhou, Zhengyuan Jiao, Jiantao Ma, Yi
Issue Date	2022
Citation	Journal of Artificial Intelligence Research, 2022, v. 73, p. 847-896 How to Cite? DOI: http://dx.doi.org/10.1613/JAIR.1.13326
Abstract	Many goal-reaching reinforcement learning (RL) tasks have empirically verified that rewarding the agent on subgoals improves convergence speed and practical performance. We attempt to provide a theoretical framework to quantify the computational benefits of rewarding the completion of subgoals, in terms of the number of synchronous value iterations. In particular, we consider subgoals as one-way intermediate states, which can only be visited once per episode and propose two settings that consider these one-way intermediate states: the one-way single-path (OWSP) and the one-way multi-path (OWMP) settings. In both OWSP and OWMP settings, we demonstrate that adding intermediate rewards to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state. We also reveal a trade-off between computational complexity and the pursuit of the shortest path in the OWMP setting: adding intermediate rewards significantly reduces the computational complexity of reaching the goal but the agent may not find the shortest path, whereas with sparse terminal rewards, the agent finds the shortest path at a significantly higher computational cost. We also corroborate our theoretical results with extensive experiments on the MiniGrid environments using Q-learning and some popular deep RL algorithms.
Persistent Identifier	http://hdl.handle.net/10722/327783
ISSN	1076-9757 2021 Impact Factor: 3.635 2020 SCImago Journal Rankings: 0.790

DC Field	Value	Language
dc.contributor.author	Zhai, Yuexiang	-
dc.contributor.author	Baek, Christina	-
dc.contributor.author	Zhou, Zhengyuan	-
dc.contributor.author	Jiao, Jiantao	-
dc.contributor.author	Ma, Yi	-
dc.date.accessioned	2023-05-08T02:26:46Z	-
dc.date.available	2023-05-08T02:26:46Z	-
dc.date.issued	2022	-
dc.identifier.citation	Journal of Artificial Intelligence Research, 2022, v. 73, p. 847-896	-
dc.identifier.issn	1076-9757	-
dc.identifier.uri	http://hdl.handle.net/10722/327783	-
dc.description.abstract	Many goal-reaching reinforcement learning (RL) tasks have empirically verified that rewarding the agent on subgoals improves convergence speed and practical performance. We attempt to provide a theoretical framework to quantify the computational benefits of rewarding the completion of subgoals, in terms of the number of synchronous value iterations. In particular, we consider subgoals as one-way intermediate states, which can only be visited once per episode and propose two settings that consider these one-way intermediate states: the one-way single-path (OWSP) and the one-way multi-path (OWMP) settings. In both OWSP and OWMP settings, we demonstrate that adding intermediate rewards to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state. We also reveal a trade-off between computational complexity and the pursuit of the shortest path in the OWMP setting: adding intermediate rewards significantly reduces the computational complexity of reaching the goal but the agent may not find the shortest path, whereas with sparse terminal rewards, the agent finds the shortest path at a significantly higher computational cost. We also corroborate our theoretical results with extensive experiments on the MiniGrid environments using Q-learning and some popular deep RL algorithms.	-
dc.language	eng	-
dc.relation.ispartof	Journal of Artificial Intelligence Research	-
dc.title	Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1613/JAIR.1.13326	-
dc.identifier.scopus	eid_2-s2.0-85128186467	-
dc.identifier.volume	73	-
dc.identifier.spage	847	-
dc.identifier.epage	896	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats