File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Appears in Collections:
Article: DriveGPT4: Interpretable End-to-End Autonomous Driving Via Large Language Model
Title | DriveGPT4: Interpretable End-to-End Autonomous Driving Via Large Language Model |
---|---|
Authors | |
Issue Date | 7-Aug-2024 |
Publisher | Institute of Electrical and Electronics Engineers |
Citation | IEEE Robotics and Automation Letters, 2024, v. 9, n. 10, p. 8186-8193 How to Cite? |
Abstract | Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous driving by introducing DriveGPT4, a novel interpretable endto-end autonomous driving system based on LLMs. Capable of processing multi-frame video inputs and textual queries, DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users. Furthermore, DriveGPT4 predicts low-level vehicle control signals in an end-to end fashion. These advanced capabilities are achieved through the utilization of a bespoke visual instruction tuning dataset, specifically tailored for autonomous driving applications, in conjunction with a mixfinetuning training strategy. DriveGPT4 represents the pioneering effort to leverage LLMs for the development of an interpretable end-to-end autonomous driving solution. Evaluations conducted on the BDD-X dataset showcase the superior qualitative and quantitative performance of DriveGPT4. Additionally, the finetuning of domain-specific data enables DriveGPT4 to yield close or even improved results in terms of autonomous driving grounding when contrasted with GPT4-V. |
Persistent Identifier | http://hdl.handle.net/10722/345944 |
ISSN | 2023 Impact Factor: 4.6 2023 SCImago Journal Rankings: 2.119 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Xu, Zhenhua | - |
dc.contributor.author | Zhang, Yujia | - |
dc.contributor.author | Xie, Enze | - |
dc.contributor.author | Zhao, Zhen | - |
dc.contributor.author | Guo, Yong | - |
dc.contributor.author | Wong, Kwan-Yee K | - |
dc.contributor.author | Li, Zhenguo | - |
dc.contributor.author | Zhao, Hengshuang | - |
dc.date.accessioned | 2024-09-04T07:06:40Z | - |
dc.date.available | 2024-09-04T07:06:40Z | - |
dc.date.issued | 2024-08-07 | - |
dc.identifier.citation | IEEE Robotics and Automation Letters, 2024, v. 9, n. 10, p. 8186-8193 | - |
dc.identifier.issn | 2377-3766 | - |
dc.identifier.uri | http://hdl.handle.net/10722/345944 | - |
dc.description.abstract | <p>Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous driving by introducing DriveGPT4, a novel interpretable endto-end autonomous driving system based on LLMs. Capable of processing multi-frame video inputs and textual queries, DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users. Furthermore, DriveGPT4 predicts low-level vehicle control signals in an end-to end fashion. These advanced capabilities are achieved through the utilization of a bespoke visual instruction tuning dataset, specifically tailored for autonomous driving applications, in conjunction with a mixfinetuning training strategy. DriveGPT4 represents the pioneering effort to leverage LLMs for the development of an interpretable end-to-end autonomous driving solution. Evaluations conducted on the BDD-X dataset showcase the superior qualitative and quantitative performance of DriveGPT4. Additionally, the finetuning of domain-specific data enables DriveGPT4 to yield close or even improved results in terms of autonomous driving grounding when contrasted with GPT4-V. </p> | - |
dc.language | eng | - |
dc.publisher | Institute of Electrical and Electronics Engineers | - |
dc.relation.ispartof | IEEE Robotics and Automation Letters | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.title | DriveGPT4: Interpretable End-to-End Autonomous Driving Via Large Language Model | - |
dc.type | Article | - |
dc.identifier.doi | 10.1109/LRA.2024.3440097 | - |
dc.identifier.volume | 9 | - |
dc.identifier.issue | 10 | - |
dc.identifier.spage | 8186 | - |
dc.identifier.epage | 8193 | - |
dc.identifier.eissn | 2377-3766 | - |
dc.identifier.issnl | 2377-3766 | - |