Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution

Wang, Tianqi; 王天奇

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution

Title	Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution
Authors	Wang, Tianqi 王天奇
Issue Date	2024
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Wang, T. [王天奇]. (2024). Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Safety is the primary priority for autonomous driving. However, the safety evaluation of different autonomous driving algorithms is often overlooked since the commonly used datasets for autonomous driving contain mostly safe and normal driving scenarios. However, other rarely-happened scenarios, such as collision accidents, are the ones that matter the most when evaluating driving safety. As a result, the first problem that this thesis aims to address is safety-critical scenario generation and evaluation. Additionally, there is growing interest in end-to-end driving, which directly uses sensor inputs to make driving decisions, due to its simplicity and promising results in simulations. However, challenges such as lack of interpretability and limited generalization hinder its real-world deployment. Recent studies also show that large language models (LLMs) perform remarkably in various tasks due to their strong reasoning capabilities, which can benefit end-to-end driving systems. This leads to the second and third problems this thesis tackles: end-to-end driving with enhanced interpretability and integration of LLMs into end-to-end driving. The first part of the thesis addresses the problem of safety-critical scenario generation and evaluation, while the existing datasets either only cover safe driving scenarios or lack detailed annotations. To fill the gap, we propose the DeepAccident dataset collected using a realistic simulator to generate diverse accident scenarios. This dataset includes data and annotations from multiple vehicles and one roadside infrastructure, enabling Vehicle-to-Everything (V2X) research. Utilizing DeepAccident, we introduce a novel task of end-to-end motion and accident prediction to directly assess driving safety. Additionally, we develop a V2X model, V2XFormer, which outperforms the single-vehicle baseline for 3D object detection, motion prediction, and accident prediction. The second part of this thesis addresses the problem of end-to-end driving with enhanced interpretability. Existing end-to-end driving approaches mainly focus on the final driving performance in closed-loop simulation while ignoring the lack of interpretability, which hinders their applications in real-world. To enhance the interpretability, we first collect a dataset called DriveCoT, which contains the detailed decision-making process of a privileged expert policy for various challenging driving scenarios in a realistic simulator. Additionally, we propose a novel end-to-end driving method named DriveCoT-Agent, which decomposes driving into several representative tasks and integrates chain-of-thought processing to enhance the interpretability of the driving decision. The third part of this thesis considers the problem of end-to-end driving empowered with the integration of large language models (LLMs). Existing works have managed to integrate LLMs into various tasks and show excellent performance. The reasoning ability of these pre-trained LLMs can potentially benefit end-to-end driving. Firstly, we expand the DriveCoT dataset with text-form dynamic annotations to indicate the expert policy's derivation process of driving decisions, mimicking human thinking. We then design and train a Vision-Language-Model named DriveCoT-LM, built on a pre-trained LLM to dynamically identify potential impact factors for driving and conduct detailed analysis for each factor to obtain the final driving decisions and reasons. DriveCoT-LM generates detailed chain-of-thought derivations for the driving decision and also shows enhanced generalizability in unseen scenarios, bringing end-to-end driving closer to real-world deployment.
Degree	Doctor of Philosophy
Subject	Automated vehicles - Computer simulation
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/352633

DC Field	Value	Language
dc.contributor.author	Wang, Tianqi	-
dc.contributor.author	王天奇	-
dc.date.accessioned	2024-12-19T09:26:52Z	-
dc.date.available	2024-12-19T09:26:52Z	-
dc.date.issued	2024	-
dc.identifier.citation	Wang, T. [王天奇]. (2024). Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/352633	-
dc.description.abstract	Safety is the primary priority for autonomous driving. However, the safety evaluation of different autonomous driving algorithms is often overlooked since the commonly used datasets for autonomous driving contain mostly safe and normal driving scenarios. However, other rarely-happened scenarios, such as collision accidents, are the ones that matter the most when evaluating driving safety. As a result, the first problem that this thesis aims to address is safety-critical scenario generation and evaluation. Additionally, there is growing interest in end-to-end driving, which directly uses sensor inputs to make driving decisions, due to its simplicity and promising results in simulations. However, challenges such as lack of interpretability and limited generalization hinder its real-world deployment. Recent studies also show that large language models (LLMs) perform remarkably in various tasks due to their strong reasoning capabilities, which can benefit end-to-end driving systems. This leads to the second and third problems this thesis tackles: end-to-end driving with enhanced interpretability and integration of LLMs into end-to-end driving. The first part of the thesis addresses the problem of safety-critical scenario generation and evaluation, while the existing datasets either only cover safe driving scenarios or lack detailed annotations. To fill the gap, we propose the DeepAccident dataset collected using a realistic simulator to generate diverse accident scenarios. This dataset includes data and annotations from multiple vehicles and one roadside infrastructure, enabling Vehicle-to-Everything (V2X) research. Utilizing DeepAccident, we introduce a novel task of end-to-end motion and accident prediction to directly assess driving safety. Additionally, we develop a V2X model, V2XFormer, which outperforms the single-vehicle baseline for 3D object detection, motion prediction, and accident prediction. The second part of this thesis addresses the problem of end-to-end driving with enhanced interpretability. Existing end-to-end driving approaches mainly focus on the final driving performance in closed-loop simulation while ignoring the lack of interpretability, which hinders their applications in real-world. To enhance the interpretability, we first collect a dataset called DriveCoT, which contains the detailed decision-making process of a privileged expert policy for various challenging driving scenarios in a realistic simulator. Additionally, we propose a novel end-to-end driving method named DriveCoT-Agent, which decomposes driving into several representative tasks and integrates chain-of-thought processing to enhance the interpretability of the driving decision. The third part of this thesis considers the problem of end-to-end driving empowered with the integration of large language models (LLMs). Existing works have managed to integrate LLMs into various tasks and show excellent performance. The reasoning ability of these pre-trained LLMs can potentially benefit end-to-end driving. Firstly, we expand the DriveCoT dataset with text-form dynamic annotations to indicate the expert policy's derivation process of driving decisions, mimicking human thinking. We then design and train a Vision-Language-Model named DriveCoT-LM, built on a pre-trained LLM to dynamically identify potential impact factors for driving and conduct detailed analysis for each factor to obtain the final driving decisions and reasons. DriveCoT-LM generates detailed chain-of-thought derivations for the driving decision and also shows enhanced generalizability in unseen scenarios, bringing end-to-end driving closer to real-world deployment.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Automated vehicles - Computer simulation	-
dc.title	Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2024	-
dc.identifier.mmsid	991044891403103414	-

File Download

Supplementary

postgraduate thesis: Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats