Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration

Cao, R; Dong, M; Jiang, X; Bi, S; Xi, N

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/978-3-031-15931-2_4
WOS: WOS:000866212300004

Supplementary

Citations:
- Web of Science: 0
Appears in Collections:
- Industrial & Manufacturing Systems Engineering: Conference papers

Conference Paper: Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration

Title	Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration
Authors	Cao, R Dong, M Jiang, X Bi, S Xi, N
Keywords	Limitation learning Offline reinforcement Learning Learning from demonstrations
Issue Date	2022
Publisher	Springer.
Citation	Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration. In Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds). Artificial Neural Networks and Machine Learning -- ICANN 2022, v. 13530, p. 38-49 How to Cite? DOI: http://dx.doi.org/10.1007/978-3-031-15931-2_4
Abstract	Imitation Learning and Offline Reinforcement Learning that learn from demonstration data are the current solutions for intelligent agents to reduce the high cost and high risk of online Reinforcement Learning. However, these solutions have struggled with the distribution shift issue with the lack of exploration of the environment. Distribution shift makes offline learning prone to making wrong decisions and leads to error accumulation in the goal-reaching continuous control tasks. Moreover, Offline Reinforcement Learning generates additional bias while learning from human demonstration data that does not satisfy the Markov process assumptions. To alleviate these two dilemmas, we present a Long-horizon Route-constrained (LHRC) policy for the continuous control tasks of goal-reaching. At a state, our method generates subgoals by long-horizon route planning and outputs actions based on the subgoal constraints. It can constrain the state space and action space of the agent. And it can correct trajectories with temporal information. Experiments on the D4RL benchmark show that our approach achieves higher scores with state-of-the-art methods and enhances performance on complex tasks.
Persistent Identifier	http://hdl.handle.net/10722/318023
ISI Accession Number ID	WOS:000866212300004

DC Field	Value	Language
dc.contributor.author	Cao, R	-
dc.contributor.author	Dong, M	-
dc.contributor.author	Jiang, X	-
dc.contributor.author	Bi, S	-
dc.contributor.author	Xi, N	-
dc.date.accessioned	2022-10-07T10:31:16Z	-
dc.date.available	2022-10-07T10:31:16Z	-
dc.date.issued	2022	-
dc.identifier.citation	Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration. In Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds). Artificial Neural Networks and Machine Learning -- ICANN 2022, v. 13530, p. 38-49	-
dc.identifier.uri	http://hdl.handle.net/10722/318023	-
dc.description.abstract	Imitation Learning and Offline Reinforcement Learning that learn from demonstration data are the current solutions for intelligent agents to reduce the high cost and high risk of online Reinforcement Learning. However, these solutions have struggled with the distribution shift issue with the lack of exploration of the environment. Distribution shift makes offline learning prone to making wrong decisions and leads to error accumulation in the goal-reaching continuous control tasks. Moreover, Offline Reinforcement Learning generates additional bias while learning from human demonstration data that does not satisfy the Markov process assumptions. To alleviate these two dilemmas, we present a Long-horizon Route-constrained (LHRC) policy for the continuous control tasks of goal-reaching. At a state, our method generates subgoals by long-horizon route planning and outputs actions based on the subgoal constraints. It can constrain the state space and action space of the agent. And it can correct trajectories with temporal information. Experiments on the D4RL benchmark show that our approach achieves higher scores with state-of-the-art methods and enhances performance on complex tasks.	-
dc.language	eng	-
dc.publisher	Springer.	-
dc.relation.ispartof	Lectures notes in computer science	-
dc.rights	This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/[insert DOI]	-
dc.subject	Limitation learning	-
dc.subject	Offline reinforcement Learning	-
dc.subject	Learning from demonstrations	-
dc.title	Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration	-
dc.type	Conference_Paper	-
dc.identifier.email	Xi, N: xining@hku.hk	-
dc.identifier.authority	Xi, N=rp02044	-
dc.identifier.doi	10.1007/978-3-031-15931-2_4	-
dc.identifier.hkuros	338302	-
dc.identifier.volume	13530	-
dc.identifier.spage	38	-
dc.identifier.epage	49	-
dc.identifier.isi	WOS:000866212300004	-
dc.publisher.place	Cham, Germany	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats