File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: DriveGPT4: Interpretable End-to-End Autonomous Driving Via Large Language Model

TitleDriveGPT4: Interpretable End-to-End Autonomous Driving Via Large Language Model
Authors
Issue Date7-Aug-2024
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Robotics and Automation Letters, 2024, v. 9, n. 10, p. 8186-8193 How to Cite?
Abstract

Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous driving by introducing DriveGPT4, a novel interpretable endto-end autonomous driving system based on LLMs. Capable of processing multi-frame video inputs and textual queries, DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users. Furthermore, DriveGPT4 predicts low-level vehicle control signals in an end-to end fashion. These advanced capabilities are achieved through the utilization of a bespoke visual instruction tuning dataset, specifically tailored for autonomous driving applications, in conjunction with a mixfinetuning training strategy. DriveGPT4 represents the pioneering effort to leverage LLMs for the development of an interpretable end-to-end autonomous driving solution. Evaluations conducted on the BDD-X dataset showcase the superior qualitative and quantitative performance of DriveGPT4. Additionally, the finetuning of domain-specific data enables DriveGPT4 to yield close or even improved results in terms of autonomous driving grounding when contrasted with GPT4-V. 


Persistent Identifierhttp://hdl.handle.net/10722/345944
ISSN
2023 Impact Factor: 4.6
2023 SCImago Journal Rankings: 2.119

 

DC FieldValueLanguage
dc.contributor.authorXu, Zhenhua-
dc.contributor.authorZhang, Yujia-
dc.contributor.authorXie, Enze-
dc.contributor.authorZhao, Zhen-
dc.contributor.authorGuo, Yong-
dc.contributor.authorWong, Kwan-Yee K-
dc.contributor.authorLi, Zhenguo-
dc.contributor.authorZhao, Hengshuang-
dc.date.accessioned2024-09-04T07:06:40Z-
dc.date.available2024-09-04T07:06:40Z-
dc.date.issued2024-08-07-
dc.identifier.citationIEEE Robotics and Automation Letters, 2024, v. 9, n. 10, p. 8186-8193-
dc.identifier.issn2377-3766-
dc.identifier.urihttp://hdl.handle.net/10722/345944-
dc.description.abstract<p>Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous driving by introducing DriveGPT4, a novel interpretable endto-end autonomous driving system based on LLMs. Capable of processing multi-frame video inputs and textual queries, DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users. Furthermore, DriveGPT4 predicts low-level vehicle control signals in an end-to end fashion. These advanced capabilities are achieved through the utilization of a bespoke visual instruction tuning dataset, specifically tailored for autonomous driving applications, in conjunction with a mixfinetuning training strategy. DriveGPT4 represents the pioneering effort to leverage LLMs for the development of an interpretable end-to-end autonomous driving solution. Evaluations conducted on the BDD-X dataset showcase the superior qualitative and quantitative performance of DriveGPT4. Additionally, the finetuning of domain-specific data enables DriveGPT4 to yield close or even improved results in terms of autonomous driving grounding when contrasted with GPT4-V. </p>-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Robotics and Automation Letters-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.titleDriveGPT4: Interpretable End-to-End Autonomous Driving Via Large Language Model-
dc.typeArticle-
dc.identifier.doi10.1109/LRA.2024.3440097-
dc.identifier.volume9-
dc.identifier.issue10-
dc.identifier.spage8186-
dc.identifier.epage8193-
dc.identifier.eissn2377-3766-
dc.identifier.issnl2377-3766-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats