File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images

TitleRefer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images
Authors
Issue Date2021
PublisherIEEE Computer Society.
Citation
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Virtual), 9-25 June, 2021 How to Cite?
AbstractGrounding referring expressions in RGBD image has been an emerging field. We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion. In contrast to previous works that directly generate object proposals for grounding in the 3D scenes, we propose a bottom-up approach to gradually aggregate context-aware information, effectively addressing the challenge posed by the partial geometry. Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that coarsely localizes the relevant regions in the RGBD image. Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object. We evaluate the proposed method by comparing to the state-of-the-art methods on both the RGBD images extracted from the ScanRefer dataset and our newly collected SUNRefer dataset. Experiments show that our method outperforms the previous methods by a large margin (by 11.2% and 15.6% Acc@0.5) on both datasets.
Persistent Identifierhttp://hdl.handle.net/10722/316362
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorLiu, H-
dc.contributor.authorLin, A-
dc.contributor.authorHan, X-
dc.contributor.authorYang, L-
dc.contributor.authorYu, Y-
dc.contributor.authorCui, S-
dc.date.accessioned2022-09-02T06:10:09Z-
dc.date.available2022-09-02T06:10:09Z-
dc.date.issued2021-
dc.identifier.citationIEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Virtual), 9-25 June, 2021-
dc.identifier.urihttp://hdl.handle.net/10722/316362-
dc.description.abstractGrounding referring expressions in RGBD image has been an emerging field. We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion. In contrast to previous works that directly generate object proposals for grounding in the 3D scenes, we propose a bottom-up approach to gradually aggregate context-aware information, effectively addressing the challenge posed by the partial geometry. Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that coarsely localizes the relevant regions in the RGBD image. Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object. We evaluate the proposed method by comparing to the state-of-the-art methods on both the RGBD images extracted from the ScanRefer dataset and our newly collected SUNRefer dataset. Experiments show that our method outperforms the previous methods by a large margin (by 11.2% and 15.6% Acc@0.5) on both datasets.-
dc.languageeng-
dc.publisherIEEE Computer Society.-
dc.rightsCopyright © IEEE Computer Society.-
dc.titleRefer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images-
dc.typeConference_Paper-
dc.identifier.emailYu, Y: yzyu@cs.hku.hk-
dc.identifier.authorityYu, Y=rp01415-
dc.identifier.doi10.1109/CVPR46437.2021.00597-
dc.identifier.hkuros336346-
dc.identifier.isiWOS:000739917306024-
dc.publisher.placeUnited States-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats