Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images

Liu, H; Lin, A; Han, X; Yang, L; Yu, Y; Cui, S

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/CVPR46437.2021.00597
WOS: WOS:000739917306024

Supplementary

Citations:
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images

Title	Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images
Authors	Liu, H Lin, A Han, X Yang, L Yu, Y Cui, S
Issue Date	2021
Publisher	IEEE Computer Society.
Citation	IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Virtual), 9-25 June, 2021 How to Cite? DOI: http://dx.doi.org/10.1109/CVPR46437.2021.00597
Abstract	Grounding referring expressions in RGBD image has been an emerging field. We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion. In contrast to previous works that directly generate object proposals for grounding in the 3D scenes, we propose a bottom-up approach to gradually aggregate context-aware information, effectively addressing the challenge posed by the partial geometry. Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that coarsely localizes the relevant regions in the RGBD image. Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object. We evaluate the proposed method by comparing to the state-of-the-art methods on both the RGBD images extracted from the ScanRefer dataset and our newly collected SUNRefer dataset. Experiments show that our method outperforms the previous methods by a large margin (by 11.2% and 15.6% Acc@0.5) on both datasets.
Persistent Identifier	http://hdl.handle.net/10722/316362
ISI Accession Number ID	WOS:000739917306024

DC Field	Value	Language
dc.contributor.author	Liu, H	-
dc.contributor.author	Lin, A	-
dc.contributor.author	Han, X	-
dc.contributor.author	Yang, L	-
dc.contributor.author	Yu, Y	-
dc.contributor.author	Cui, S	-
dc.date.accessioned	2022-09-02T06:10:09Z	-
dc.date.available	2022-09-02T06:10:09Z	-
dc.date.issued	2021	-
dc.identifier.citation	IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Virtual), 9-25 June, 2021	-
dc.identifier.uri	http://hdl.handle.net/10722/316362	-
dc.description.abstract	Grounding referring expressions in RGBD image has been an emerging field. We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion. In contrast to previous works that directly generate object proposals for grounding in the 3D scenes, we propose a bottom-up approach to gradually aggregate context-aware information, effectively addressing the challenge posed by the partial geometry. Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that coarsely localizes the relevant regions in the RGBD image. Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object. We evaluate the proposed method by comparing to the state-of-the-art methods on both the RGBD images extracted from the ScanRefer dataset and our newly collected SUNRefer dataset. Experiments show that our method outperforms the previous methods by a large margin (by 11.2% and 15.6% Acc@0.5) on both datasets.	-
dc.language	eng	-
dc.publisher	IEEE Computer Society.	-
dc.rights	Copyright © IEEE Computer Society.	-
dc.title	Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images	-
dc.type	Conference_Paper	-
dc.identifier.email	Yu, Y: yzyu@cs.hku.hk	-
dc.identifier.authority	Yu, Y=rp01415	-
dc.identifier.doi	10.1109/CVPR46437.2021.00597	-
dc.identifier.hkuros	336346	-
dc.identifier.isi	WOS:000739917306024	-
dc.publisher.place	United States	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats