File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration

TitleLong-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration
Authors
KeywordsLimitation learning
Offline reinforcement Learning
Learning from demonstrations
Issue Date2022
PublisherSpringer.
Citation
Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration. In Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds). Artificial Neural Networks and Machine Learning -- ICANN 2022, v. 13530, p. 38-49 How to Cite?
AbstractImitation Learning and Offline Reinforcement Learning that learn from demonstration data are the current solutions for intelligent agents to reduce the high cost and high risk of online Reinforcement Learning. However, these solutions have struggled with the distribution shift issue with the lack of exploration of the environment. Distribution shift makes offline learning prone to making wrong decisions and leads to error accumulation in the goal-reaching continuous control tasks. Moreover, Offline Reinforcement Learning generates additional bias while learning from human demonstration data that does not satisfy the Markov process assumptions. To alleviate these two dilemmas, we present a Long-horizon Route-constrained (LHRC) policy for the continuous control tasks of goal-reaching. At a state, our method generates subgoals by long-horizon route planning and outputs actions based on the subgoal constraints. It can constrain the state space and action space of the agent. And it can correct trajectories with temporal information. Experiments on the D4RL benchmark show that our approach achieves higher scores with state-of-the-art methods and enhances performance on complex tasks.
Persistent Identifierhttp://hdl.handle.net/10722/318023

 

DC FieldValueLanguage
dc.contributor.authorCao, R-
dc.contributor.authorDong, M-
dc.contributor.authorJiang, X-
dc.contributor.authorBi, S-
dc.contributor.authorXi, N-
dc.date.accessioned2022-10-07T10:31:16Z-
dc.date.available2022-10-07T10:31:16Z-
dc.date.issued2022-
dc.identifier.citationLong-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration. In Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds). Artificial Neural Networks and Machine Learning -- ICANN 2022, v. 13530, p. 38-49-
dc.identifier.urihttp://hdl.handle.net/10722/318023-
dc.description.abstractImitation Learning and Offline Reinforcement Learning that learn from demonstration data are the current solutions for intelligent agents to reduce the high cost and high risk of online Reinforcement Learning. However, these solutions have struggled with the distribution shift issue with the lack of exploration of the environment. Distribution shift makes offline learning prone to making wrong decisions and leads to error accumulation in the goal-reaching continuous control tasks. Moreover, Offline Reinforcement Learning generates additional bias while learning from human demonstration data that does not satisfy the Markov process assumptions. To alleviate these two dilemmas, we present a Long-horizon Route-constrained (LHRC) policy for the continuous control tasks of goal-reaching. At a state, our method generates subgoals by long-horizon route planning and outputs actions based on the subgoal constraints. It can constrain the state space and action space of the agent. And it can correct trajectories with temporal information. Experiments on the D4RL benchmark show that our approach achieves higher scores with state-of-the-art methods and enhances performance on complex tasks.-
dc.languageeng-
dc.publisherSpringer.-
dc.relation.ispartofLectures notes in computer science-
dc.rightsThis version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/[insert DOI]-
dc.subjectLimitation learning-
dc.subjectOffline reinforcement Learning-
dc.subjectLearning from demonstrations-
dc.titleLong-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration-
dc.typeConference_Paper-
dc.identifier.emailXi, N: xining@hku.hk-
dc.identifier.authorityXi, N=rp02044-
dc.identifier.doi10.1007/978-3-031-15931-2_4-
dc.identifier.hkuros338302-
dc.identifier.volume13530-
dc.identifier.spage38-
dc.identifier.epage49-
dc.publisher.placeCham, Germany-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats