File Download
Supplementary

postgraduate thesis: Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution

TitleEnhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution
Authors
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Wang, T. [王天奇]. (2024). Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractSafety is the primary priority for autonomous driving. However, the safety evaluation of different autonomous driving algorithms is often overlooked since the commonly used datasets for autonomous driving contain mostly safe and normal driving scenarios. However, other rarely-happened scenarios, such as collision accidents, are the ones that matter the most when evaluating driving safety. As a result, the first problem that this thesis aims to address is safety-critical scenario generation and evaluation. Additionally, there is growing interest in end-to-end driving, which directly uses sensor inputs to make driving decisions, due to its simplicity and promising results in simulations. However, challenges such as lack of interpretability and limited generalization hinder its real-world deployment. Recent studies also show that large language models (LLMs) perform remarkably in various tasks due to their strong reasoning capabilities, which can benefit end-to-end driving systems. This leads to the second and third problems this thesis tackles: end-to-end driving with enhanced interpretability and integration of LLMs into end-to-end driving. The first part of the thesis addresses the problem of safety-critical scenario generation and evaluation, while the existing datasets either only cover safe driving scenarios or lack detailed annotations. To fill the gap, we propose the DeepAccident dataset collected using a realistic simulator to generate diverse accident scenarios. This dataset includes data and annotations from multiple vehicles and one roadside infrastructure, enabling Vehicle-to-Everything (V2X) research. Utilizing DeepAccident, we introduce a novel task of end-to-end motion and accident prediction to directly assess driving safety. Additionally, we develop a V2X model, V2XFormer, which outperforms the single-vehicle baseline for 3D object detection, motion prediction, and accident prediction. The second part of this thesis addresses the problem of end-to-end driving with enhanced interpretability. Existing end-to-end driving approaches mainly focus on the final driving performance in closed-loop simulation while ignoring the lack of interpretability, which hinders their applications in real-world. To enhance the interpretability, we first collect a dataset called DriveCoT, which contains the detailed decision-making process of a privileged expert policy for various challenging driving scenarios in a realistic simulator. Additionally, we propose a novel end-to-end driving method named DriveCoT-Agent, which decomposes driving into several representative tasks and integrates chain-of-thought processing to enhance the interpretability of the driving decision. The third part of this thesis considers the problem of end-to-end driving empowered with the integration of large language models (LLMs). Existing works have managed to integrate LLMs into various tasks and show excellent performance. The reasoning ability of these pre-trained LLMs can potentially benefit end-to-end driving. Firstly, we expand the DriveCoT dataset with text-form dynamic annotations to indicate the expert policy's derivation process of driving decisions, mimicking human thinking. We then design and train a Vision-Language-Model named DriveCoT-LM, built on a pre-trained LLM to dynamically identify potential impact factors for driving and conduct detailed analysis for each factor to obtain the final driving decisions and reasons. DriveCoT-LM generates detailed chain-of-thought derivations for the driving decision and also shows enhanced generalizability in unseen scenarios, bringing end-to-end driving closer to real-world deployment.
DegreeDoctor of Philosophy
SubjectAutomated vehicles - Computer simulation
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/352633

 

DC FieldValueLanguage
dc.contributor.authorWang, Tianqi-
dc.contributor.author王天奇-
dc.date.accessioned2024-12-19T09:26:52Z-
dc.date.available2024-12-19T09:26:52Z-
dc.date.issued2024-
dc.identifier.citationWang, T. [王天奇]. (2024). Enhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/352633-
dc.description.abstractSafety is the primary priority for autonomous driving. However, the safety evaluation of different autonomous driving algorithms is often overlooked since the commonly used datasets for autonomous driving contain mostly safe and normal driving scenarios. However, other rarely-happened scenarios, such as collision accidents, are the ones that matter the most when evaluating driving safety. As a result, the first problem that this thesis aims to address is safety-critical scenario generation and evaluation. Additionally, there is growing interest in end-to-end driving, which directly uses sensor inputs to make driving decisions, due to its simplicity and promising results in simulations. However, challenges such as lack of interpretability and limited generalization hinder its real-world deployment. Recent studies also show that large language models (LLMs) perform remarkably in various tasks due to their strong reasoning capabilities, which can benefit end-to-end driving systems. This leads to the second and third problems this thesis tackles: end-to-end driving with enhanced interpretability and integration of LLMs into end-to-end driving. The first part of the thesis addresses the problem of safety-critical scenario generation and evaluation, while the existing datasets either only cover safe driving scenarios or lack detailed annotations. To fill the gap, we propose the DeepAccident dataset collected using a realistic simulator to generate diverse accident scenarios. This dataset includes data and annotations from multiple vehicles and one roadside infrastructure, enabling Vehicle-to-Everything (V2X) research. Utilizing DeepAccident, we introduce a novel task of end-to-end motion and accident prediction to directly assess driving safety. Additionally, we develop a V2X model, V2XFormer, which outperforms the single-vehicle baseline for 3D object detection, motion prediction, and accident prediction. The second part of this thesis addresses the problem of end-to-end driving with enhanced interpretability. Existing end-to-end driving approaches mainly focus on the final driving performance in closed-loop simulation while ignoring the lack of interpretability, which hinders their applications in real-world. To enhance the interpretability, we first collect a dataset called DriveCoT, which contains the detailed decision-making process of a privileged expert policy for various challenging driving scenarios in a realistic simulator. Additionally, we propose a novel end-to-end driving method named DriveCoT-Agent, which decomposes driving into several representative tasks and integrates chain-of-thought processing to enhance the interpretability of the driving decision. The third part of this thesis considers the problem of end-to-end driving empowered with the integration of large language models (LLMs). Existing works have managed to integrate LLMs into various tasks and show excellent performance. The reasoning ability of these pre-trained LLMs can potentially benefit end-to-end driving. Firstly, we expand the DriveCoT dataset with text-form dynamic annotations to indicate the expert policy's derivation process of driving decisions, mimicking human thinking. We then design and train a Vision-Language-Model named DriveCoT-LM, built on a pre-trained LLM to dynamically identify potential impact factors for driving and conduct detailed analysis for each factor to obtain the final driving decisions and reasons. DriveCoT-LM generates detailed chain-of-thought derivations for the driving decision and also shows enhanced generalizability in unseen scenarios, bringing end-to-end driving closer to real-world deployment.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshAutomated vehicles - Computer simulation-
dc.titleEnhancing safety and generalization for autonomous driving : safety-critical scenario generation and vision-based resolution-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044891403103414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats