File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

TitleBEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers
Authors
Keywords3D object detection
Autonomous driving
Bird’s-Eye-View
Map segmentation
Transformer
Issue Date2022
Citation
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, v. 13669 LNCS, p. 1-18 How to Cite?
Abstract3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9% in terms of NDS metric on the nuScenes test set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. The code is available at https://github.com/zhiqi-li/BEVFormer.
Persistent Identifierhttp://hdl.handle.net/10722/351455
ISSN
2023 SCImago Journal Rankings: 0.606

 

DC FieldValueLanguage
dc.contributor.authorLi, Zhiqi-
dc.contributor.authorWang, Wenhai-
dc.contributor.authorLi, Hongyang-
dc.contributor.authorXie, Enze-
dc.contributor.authorSima, Chonghao-
dc.contributor.authorLu, Tong-
dc.contributor.authorQiao, Yu-
dc.contributor.authorDai, Jifeng-
dc.date.accessioned2024-11-20T03:56:23Z-
dc.date.available2024-11-20T03:56:23Z-
dc.date.issued2022-
dc.identifier.citationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, v. 13669 LNCS, p. 1-18-
dc.identifier.issn0302-9743-
dc.identifier.urihttp://hdl.handle.net/10722/351455-
dc.description.abstract3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9% in terms of NDS metric on the nuScenes test set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. The code is available at https://github.com/zhiqi-li/BEVFormer.-
dc.languageeng-
dc.relation.ispartofLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)-
dc.subject3D object detection-
dc.subjectAutonomous driving-
dc.subjectBird’s-Eye-View-
dc.subjectMap segmentation-
dc.subjectTransformer-
dc.titleBEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1007/978-3-031-20077-9_1-
dc.identifier.scopuseid_2-s2.0-85142683816-
dc.identifier.volume13669 LNCS-
dc.identifier.spage1-
dc.identifier.epage18-
dc.identifier.eissn1611-3349-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats