File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1007/978-3-031-15931-2_4
- WOS: WOS:000866212300004
Supplementary
-
Citations:
- Web of Science: 0
- Appears in Collections:
Conference Paper: Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration
Title | Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration |
---|---|
Authors | |
Keywords | Limitation learning Offline reinforcement Learning Learning from demonstrations |
Issue Date | 2022 |
Publisher | Springer. |
Citation | Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration. In Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds). Artificial Neural Networks and Machine Learning -- ICANN 2022, v. 13530, p. 38-49 How to Cite? |
Abstract | Imitation Learning and Offline Reinforcement Learning that learn from demonstration data are the current solutions for intelligent agents to reduce the high cost and high risk of online Reinforcement Learning. However, these solutions have struggled with the distribution shift issue with the lack of exploration of the environment. Distribution shift makes offline learning prone to making wrong decisions and leads to error accumulation in the goal-reaching continuous control tasks. Moreover, Offline Reinforcement Learning generates additional bias while learning from human demonstration data that does not satisfy the Markov process assumptions. To alleviate these two dilemmas, we present a Long-horizon Route-constrained (LHRC) policy for the continuous control tasks of goal-reaching. At a state, our method generates subgoals by long-horizon route planning and outputs actions based on the subgoal constraints. It can constrain the state space and action space of the agent. And it can correct trajectories with temporal information. Experiments on the D4RL benchmark show that our approach achieves higher scores with state-of-the-art methods and enhances performance on complex tasks. |
Persistent Identifier | http://hdl.handle.net/10722/318023 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Cao, R | - |
dc.contributor.author | Dong, M | - |
dc.contributor.author | Jiang, X | - |
dc.contributor.author | Bi, S | - |
dc.contributor.author | Xi, N | - |
dc.date.accessioned | 2022-10-07T10:31:16Z | - |
dc.date.available | 2022-10-07T10:31:16Z | - |
dc.date.issued | 2022 | - |
dc.identifier.citation | Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration. In Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds). Artificial Neural Networks and Machine Learning -- ICANN 2022, v. 13530, p. 38-49 | - |
dc.identifier.uri | http://hdl.handle.net/10722/318023 | - |
dc.description.abstract | Imitation Learning and Offline Reinforcement Learning that learn from demonstration data are the current solutions for intelligent agents to reduce the high cost and high risk of online Reinforcement Learning. However, these solutions have struggled with the distribution shift issue with the lack of exploration of the environment. Distribution shift makes offline learning prone to making wrong decisions and leads to error accumulation in the goal-reaching continuous control tasks. Moreover, Offline Reinforcement Learning generates additional bias while learning from human demonstration data that does not satisfy the Markov process assumptions. To alleviate these two dilemmas, we present a Long-horizon Route-constrained (LHRC) policy for the continuous control tasks of goal-reaching. At a state, our method generates subgoals by long-horizon route planning and outputs actions based on the subgoal constraints. It can constrain the state space and action space of the agent. And it can correct trajectories with temporal information. Experiments on the D4RL benchmark show that our approach achieves higher scores with state-of-the-art methods and enhances performance on complex tasks. | - |
dc.language | eng | - |
dc.publisher | Springer. | - |
dc.relation.ispartof | Lectures notes in computer science | - |
dc.rights | This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/[insert DOI] | - |
dc.subject | Limitation learning | - |
dc.subject | Offline reinforcement Learning | - |
dc.subject | Learning from demonstrations | - |
dc.title | Long-Horizon Route-Constrained Policy for Learning Continuous Control Without Exploration | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Xi, N: xining@hku.hk | - |
dc.identifier.authority | Xi, N=rp02044 | - |
dc.identifier.doi | 10.1007/978-3-031-15931-2_4 | - |
dc.identifier.hkuros | 338302 | - |
dc.identifier.volume | 13530 | - |
dc.identifier.spage | 38 | - |
dc.identifier.epage | 49 | - |
dc.identifier.isi | WOS:000866212300004 | - |
dc.publisher.place | Cham, Germany | - |