Scene as Occupancy

Tong, Wenwen; Sima, Chonghao; Wang, Tai; Chen, Li; Wu, Silei; Deng, Hanming; Gu, Yi; Lu, Lewei; Luo, Ping; Lin, Dahua; Li, Hongyang

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/ICCV51070.2023.00772
Scopus: eid_2-s2.0-85178443077
WOS: WOS:001169499000052
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Conference papers

Conference Paper: Scene as Occupancy

Title	Scene as Occupancy
Authors	Tong, Wenwen Sima, Chonghao Wang, Tai Chen, Li Wu, Silei Deng, Hanming Gu, Yi Lu, Lewei Luo, Ping Lin, Dahua Li, Hongyang
Issue Date	2023
Citation	Proceedings of the IEEE International Conference on Computer Vision, 2023, p. 8372-8381 How to Cite? DOI: http://dx.doi.org/10.1109/ICCV51070.2023.00772
Abstract	Human driver can easily describe the complex traffic scene by visual system. Such an ability of precise perception is essential for driver's planning. To achieve this, a geometry-aware representation that quantizes the physical 3D scene into structured grid map with semantic labels per cell, termed as 3D Occupancy, would be desirable. Compared to the form of bounding box, a key insight behind occupancy is that it could capture the fine-grained details of critical obstacles in the scene, and thereby facilitate subsequent tasks. Prior or concurrent literature mainly concentrate on a single scene completion task, where we might argue that the potential of this occupancy representation might obsess broader impact. In this paper, we propose OccNet, a multi-view vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy. At the core of OccNet is a general occupancy embedding to represent 3D physical world. Such a descriptor could be applied towards a wide span of driving tasks, including detection, segmentation and planning. To validate the effectiveness of this new representation and our proposed algorithm, we propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes. Empirical experiments show that there are evident performance gain across multiple tasks, e.g., motion planning could witness a collision rate reduction by 15%-58%, demonstrating the superiority of our method.
Persistent Identifier	http://hdl.handle.net/10722/351487
ISSN	1550-5499 2023 SCImago Journal Rankings: 12.263
ISI Accession Number ID	WOS:001169499000052

DC Field	Value	Language
dc.contributor.author	Tong, Wenwen	-
dc.contributor.author	Sima, Chonghao	-
dc.contributor.author	Wang, Tai	-
dc.contributor.author	Chen, Li	-
dc.contributor.author	Wu, Silei	-
dc.contributor.author	Deng, Hanming	-
dc.contributor.author	Gu, Yi	-
dc.contributor.author	Lu, Lewei	-
dc.contributor.author	Luo, Ping	-
dc.contributor.author	Lin, Dahua	-
dc.contributor.author	Li, Hongyang	-
dc.date.accessioned	2024-11-20T03:56:39Z	-
dc.date.available	2024-11-20T03:56:39Z	-
dc.date.issued	2023	-
dc.identifier.citation	Proceedings of the IEEE International Conference on Computer Vision, 2023, p. 8372-8381	-
dc.identifier.issn	1550-5499	-
dc.identifier.uri	http://hdl.handle.net/10722/351487	-
dc.description.abstract	Human driver can easily describe the complex traffic scene by visual system. Such an ability of precise perception is essential for driver's planning. To achieve this, a geometry-aware representation that quantizes the physical 3D scene into structured grid map with semantic labels per cell, termed as 3D Occupancy, would be desirable. Compared to the form of bounding box, a key insight behind occupancy is that it could capture the fine-grained details of critical obstacles in the scene, and thereby facilitate subsequent tasks. Prior or concurrent literature mainly concentrate on a single scene completion task, where we might argue that the potential of this occupancy representation might obsess broader impact. In this paper, we propose OccNet, a multi-view vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy. At the core of OccNet is a general occupancy embedding to represent 3D physical world. Such a descriptor could be applied towards a wide span of driving tasks, including detection, segmentation and planning. To validate the effectiveness of this new representation and our proposed algorithm, we propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes. Empirical experiments show that there are evident performance gain across multiple tasks, e.g., motion planning could witness a collision rate reduction by 15%-58%, demonstrating the superiority of our method.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the IEEE International Conference on Computer Vision	-
dc.title	Scene as Occupancy	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/ICCV51070.2023.00772	-
dc.identifier.scopus	eid_2-s2.0-85178443077	-
dc.identifier.spage	8372	-
dc.identifier.epage	8381	-
dc.identifier.isi	WOS:001169499000052	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Scene as Occupancy

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats