File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution
Title | Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution |
---|---|
Authors | |
Issue Date | 2024 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Wang, T. [王天奇]. (2024). Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Safety is the primary priority for autonomous driving. However, the safety evaluation of different autonomous driving algorithms is often overlooked since the commonly used datasets for autonomous driving contain mostly safe and normal driving scenarios. However, other rarely-happened scenarios, such as collision accidents, are the ones that matter the most when evaluating driving safety. As a result, the first problem that this thesis aims to address is safety-critical scenario generation and evaluation. Additionally, there is growing interest in end-to-end driving, which directly uses sensor inputs to make driving decisions, due to its simplicity and promising results in simulations. However, challenges such as lack of interpretability and limited generalization hinder its real-world deployment. Recent studies also show that large language models (LLMs) perform remarkably in various tasks due to their strong reasoning capabilities, which can benefit end-to-end driving systems. This leads to the second and third problems this thesis tackles: end-to-end driving with enhanced interpretability and integration of LLMs into end-to-end driving.
The first part of the thesis addresses the problem of safety-critical scenario generation and evaluation, while the existing datasets either only cover safe driving scenarios or lack detailed annotations. To fill the gap, we propose the DeepAccident dataset collected using a realistic simulator to generate diverse accident scenarios. This dataset includes data and annotations from multiple vehicles and one roadside infrastructure, enabling Vehicle-to-Everything (V2X) research. Utilizing DeepAccident, we introduce a novel task of end-to-end motion and accident prediction to directly assess driving safety. Additionally, we develop a V2X model, V2XFormer, which outperforms the single-vehicle baseline for 3D object detection, motion prediction, and accident prediction.
The second part of this thesis addresses the problem of end-to-end driving with enhanced interpretability. Existing end-to-end driving approaches mainly focus on the final driving performance in closed-loop simulation while ignoring the lack of interpretability, which hinders their applications in real-world. To enhance the interpretability, we first collect a dataset called DriveCoT, which contains the detailed decision-making process of a privileged expert policy for various challenging driving scenarios in a realistic simulator. Additionally, we propose a novel end-to-end driving method named DriveCoT-Agent, which decomposes driving into several representative tasks and integrates chain-of-thought processing to enhance the interpretability of the driving decision.
The third part of this thesis considers the problem of end-to-end driving empowered with the integration of large language models (LLMs). Existing works have managed to integrate LLMs into various tasks and show excellent performance. The reasoning ability of these pre-trained LLMs can potentially benefit end-to-end driving. Firstly, we expand the DriveCoT dataset with text-form dynamic annotations to indicate the expert policy's derivation process of driving decisions, mimicking human thinking. We then design and train a Vision-Language-Model named DriveCoT-LM, built on a pre-trained LLM to dynamically identify potential impact factors for driving and conduct detailed analysis for each factor to obtain the final driving decisions and reasons. DriveCoT-LM generates detailed chain-of-thought derivations for the driving decision and also shows enhanced generalizability in unseen scenarios, bringing end-to-end driving closer to real-world deployment. |
Degree | Doctor of Philosophy |
Subject | Automated vehicles - Computer simulation |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/352633 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Wang, Tianqi | - |
dc.contributor.author | 王天奇 | - |
dc.date.accessioned | 2024-12-19T09:26:52Z | - |
dc.date.available | 2024-12-19T09:26:52Z | - |
dc.date.issued | 2024 | - |
dc.identifier.citation | Wang, T. [王天奇]. (2024). Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/352633 | - |
dc.description.abstract | Safety is the primary priority for autonomous driving. However, the safety evaluation of different autonomous driving algorithms is often overlooked since the commonly used datasets for autonomous driving contain mostly safe and normal driving scenarios. However, other rarely-happened scenarios, such as collision accidents, are the ones that matter the most when evaluating driving safety. As a result, the first problem that this thesis aims to address is safety-critical scenario generation and evaluation. Additionally, there is growing interest in end-to-end driving, which directly uses sensor inputs to make driving decisions, due to its simplicity and promising results in simulations. However, challenges such as lack of interpretability and limited generalization hinder its real-world deployment. Recent studies also show that large language models (LLMs) perform remarkably in various tasks due to their strong reasoning capabilities, which can benefit end-to-end driving systems. This leads to the second and third problems this thesis tackles: end-to-end driving with enhanced interpretability and integration of LLMs into end-to-end driving. The first part of the thesis addresses the problem of safety-critical scenario generation and evaluation, while the existing datasets either only cover safe driving scenarios or lack detailed annotations. To fill the gap, we propose the DeepAccident dataset collected using a realistic simulator to generate diverse accident scenarios. This dataset includes data and annotations from multiple vehicles and one roadside infrastructure, enabling Vehicle-to-Everything (V2X) research. Utilizing DeepAccident, we introduce a novel task of end-to-end motion and accident prediction to directly assess driving safety. Additionally, we develop a V2X model, V2XFormer, which outperforms the single-vehicle baseline for 3D object detection, motion prediction, and accident prediction. The second part of this thesis addresses the problem of end-to-end driving with enhanced interpretability. Existing end-to-end driving approaches mainly focus on the final driving performance in closed-loop simulation while ignoring the lack of interpretability, which hinders their applications in real-world. To enhance the interpretability, we first collect a dataset called DriveCoT, which contains the detailed decision-making process of a privileged expert policy for various challenging driving scenarios in a realistic simulator. Additionally, we propose a novel end-to-end driving method named DriveCoT-Agent, which decomposes driving into several representative tasks and integrates chain-of-thought processing to enhance the interpretability of the driving decision. The third part of this thesis considers the problem of end-to-end driving empowered with the integration of large language models (LLMs). Existing works have managed to integrate LLMs into various tasks and show excellent performance. The reasoning ability of these pre-trained LLMs can potentially benefit end-to-end driving. Firstly, we expand the DriveCoT dataset with text-form dynamic annotations to indicate the expert policy's derivation process of driving decisions, mimicking human thinking. We then design and train a Vision-Language-Model named DriveCoT-LM, built on a pre-trained LLM to dynamically identify potential impact factors for driving and conduct detailed analysis for each factor to obtain the final driving decisions and reasons. DriveCoT-LM generates detailed chain-of-thought derivations for the driving decision and also shows enhanced generalizability in unseen scenarios, bringing end-to-end driving closer to real-world deployment. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Automated vehicles - Computer simulation | - |
dc.title | Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044891403103414 | - |