A two-stage reinforcement learning approach for multi-uav collision avoidance under imperfect sensing

WANG, D; FAN, T; Han, T; Pan, J

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/LRA.2020.2974648
Scopus: eid_2-s2.0-85081588190
WOS: WOS:000526521900003
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: A two-stage reinforcement learning approach for multi-uav collision avoidance under imperfect sensing

Title	A two-stage reinforcement learning approach for multi-uav collision avoidance under imperfect sensing
Authors	WANG, D FAN, T Han, T Pan, J
Keywords	Collision avoidance Robot sensing systems Learning (artificial intelligence) Training Navigation
Issue Date	2020
Publisher	Institute of Electrical and Electronics Engineers. The Journal's web site is located at https://www.ieee.org/membership-catalog/productdetail/showProductDetailPage.html?product=PER481-ELE
Citation	IEEE Robotics and Automation Letters, 2020, v. 5 n. 2, p. 3098-3105 How to Cite? DOI: http://dx.doi.org/10.1109/LRA.2020.2974648
Abstract	Unlike autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs) have a higher dimensional configuration space, which makes the motion planning of multi-UAVs a challenging task. In addition, uncertainties and noises are more significant in UAV scenarios, which increases the difficulty of autonomous navigation for multi-UAV. In this letter, we proposed a two-stage reinforcement learning (RL) based multi-UAV collision avoidance approach without explicitly modeling the uncertainty and noise in the environment. Our goal is to train a policy to plan a collision-free trajectory by leveraging local noisy observations. However, the reinforcement learned collision avoidance policies usually suffer from high variance and low reproducibility, because unlike supervised learning, RL does not have a fixed training set with ground-truth labels. To address these issues, we introduced a two-stage training method for RL based collision avoidance. For the first stage, we optimize the policy using a supervised training method with a loss function that encourages the agent to follow the well-known reciprocal collision avoidance strategy. For the second stage, we use policy gradient to refine the policy. We validate our policy in a variety of simulated scenarios, and the extensive numerical simulations demonstrate that our policy can generate time-efficient and collision-free paths under imperfect sensing, and can well handle noisy local observations with unknown noise levels.
Persistent Identifier	http://hdl.handle.net/10722/285105
ISSN	2377-3766 2021 Impact Factor: 4.321 2020 SCImago Journal Rankings: 1.123
ISI Accession Number ID	WOS:000526521900003

DC Field	Value	Language
dc.contributor.author	WANG, D	-
dc.contributor.author	FAN, T	-
dc.contributor.author	Han, T	-
dc.contributor.author	Pan, J	-
dc.date.accessioned	2020-08-07T09:06:49Z	-
dc.date.available	2020-08-07T09:06:49Z	-
dc.date.issued	2020	-
dc.identifier.citation	IEEE Robotics and Automation Letters, 2020, v. 5 n. 2, p. 3098-3105	-
dc.identifier.issn	2377-3766	-
dc.identifier.uri	http://hdl.handle.net/10722/285105	-
dc.description.abstract	Unlike autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs) have a higher dimensional configuration space, which makes the motion planning of multi-UAVs a challenging task. In addition, uncertainties and noises are more significant in UAV scenarios, which increases the difficulty of autonomous navigation for multi-UAV. In this letter, we proposed a two-stage reinforcement learning (RL) based multi-UAV collision avoidance approach without explicitly modeling the uncertainty and noise in the environment. Our goal is to train a policy to plan a collision-free trajectory by leveraging local noisy observations. However, the reinforcement learned collision avoidance policies usually suffer from high variance and low reproducibility, because unlike supervised learning, RL does not have a fixed training set with ground-truth labels. To address these issues, we introduced a two-stage training method for RL based collision avoidance. For the first stage, we optimize the policy using a supervised training method with a loss function that encourages the agent to follow the well-known reciprocal collision avoidance strategy. For the second stage, we use policy gradient to refine the policy. We validate our policy in a variety of simulated scenarios, and the extensive numerical simulations demonstrate that our policy can generate time-efficient and collision-free paths under imperfect sensing, and can well handle noisy local observations with unknown noise levels.	-
dc.language	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers. The Journal's web site is located at https://www.ieee.org/membership-catalog/productdetail/showProductDetailPage.html?product=PER481-ELE	-
dc.relation.ispartof	IEEE Robotics and Automation Letters	-
dc.rights	IEEE Robotics and Automation Letters. Copyright © Institute of Electrical and Electronics Engineers.	-
dc.rights	©20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	-
dc.subject	Collision avoidance	-
dc.subject	Robot sensing systems	-
dc.subject	Learning (artificial intelligence)	-
dc.subject	Training	-
dc.subject	Navigation	-
dc.title	A two-stage reinforcement learning approach for multi-uav collision avoidance under imperfect sensing	-
dc.type	Article	-
dc.identifier.email	Pan, J: jpan@cs.hku.hk	-
dc.identifier.authority	Pan, J=rp01984	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/LRA.2020.2974648	-
dc.identifier.scopus	eid_2-s2.0-85081588190	-
dc.identifier.hkuros	312130	-
dc.identifier.volume	5	-
dc.identifier.issue	2	-
dc.identifier.spage	3098	-
dc.identifier.epage	3105	-
dc.identifier.isi	WOS:000526521900003	-
dc.publisher.place	United States	-
dc.identifier.issnl	2377-3766	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: A two-stage reinforcement learning approach for multi-uav collision avoidance under imperfect sensing

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats