File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Graph-Structured Referring Expression Reasoning in The Wild

TitleGraph-Structured Referring Expression Reasoning in The Wild
Authors
KeywordsCognition
Visualization
Semantics
Linguistics
Grounding
Issue Date2020
PublisherIEEE Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000147
Citation
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 14-19 June 2020, p. 9949-9958 How to Cite?
AbstractGrounding referring expressions aims to locate in an image an object referred to by a natural language expression. The linguistic structure of a referring expression provides a layout of reasoning over the visual contents, and it is often crucial to align and jointly understand the image and the referring expression. In this paper, we propose a scene graph guided modular network (SGMN), which performs reasoning over a semantic graph and a scene graph with neural modules under the guidance of the linguistic structure of the expression. In particular, we model the image as a structured semantic graph, and parse the expression into a language scene graph. The language scene graph not only decodes the linguistic structure of the expression, but also has a consistent representation with the image semantic graph. In addition to exploring structured solutions to grounding referring expressions, we also propose Ref-Reasoning, a large-scale real-world dataset for structured referring expression reasoning. We automatically generate referring expressions over the scene graphs of images using diverse expression templates and functional programs. This dataset is equipped with real-world visual contents as well as semantically rich expressions with different reasoning layouts. Experimental results show that our SGMN not only significantly outperforms existing state-of-the-art algorithms on the new Ref-Reasoning dataset, but also surpasses state-of-the-art structured methods on commonly used benchmark datasets. It can also provide interpretable visual evidences of reasoning.
DescriptionSession: Oral 3.1C — Vision & Language ; Poster No. 26 - Paper ID 2703
CVPR 2020 took place virtually due to COVID-19
Persistent Identifierhttp://hdl.handle.net/10722/284144
ISSN
2023 SCImago Journal Rankings: 10.331

 

DC FieldValueLanguage
dc.contributor.authorYang, S-
dc.contributor.authorLi, G-
dc.contributor.authorYu, Y-
dc.date.accessioned2020-07-20T05:56:26Z-
dc.date.available2020-07-20T05:56:26Z-
dc.date.issued2020-
dc.identifier.citationProceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 14-19 June 2020, p. 9949-9958-
dc.identifier.issn1063-6919-
dc.identifier.urihttp://hdl.handle.net/10722/284144-
dc.descriptionSession: Oral 3.1C — Vision & Language ; Poster No. 26 - Paper ID 2703-
dc.descriptionCVPR 2020 took place virtually due to COVID-19-
dc.description.abstractGrounding referring expressions aims to locate in an image an object referred to by a natural language expression. The linguistic structure of a referring expression provides a layout of reasoning over the visual contents, and it is often crucial to align and jointly understand the image and the referring expression. In this paper, we propose a scene graph guided modular network (SGMN), which performs reasoning over a semantic graph and a scene graph with neural modules under the guidance of the linguistic structure of the expression. In particular, we model the image as a structured semantic graph, and parse the expression into a language scene graph. The language scene graph not only decodes the linguistic structure of the expression, but also has a consistent representation with the image semantic graph. In addition to exploring structured solutions to grounding referring expressions, we also propose Ref-Reasoning, a large-scale real-world dataset for structured referring expression reasoning. We automatically generate referring expressions over the scene graphs of images using diverse expression templates and functional programs. This dataset is equipped with real-world visual contents as well as semantically rich expressions with different reasoning layouts. Experimental results show that our SGMN not only significantly outperforms existing state-of-the-art algorithms on the new Ref-Reasoning dataset, but also surpasses state-of-the-art structured methods on commonly used benchmark datasets. It can also provide interpretable visual evidences of reasoning.-
dc.languageeng-
dc.publisherIEEE Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000147-
dc.relation.ispartofIEEE Conference on Computer Vision and Pattern Recognition. Proceedings-
dc.rightsIEEE Conference on Computer Vision and Pattern Recognition. Proceedings. Copyright © IEEE Computer Society.-
dc.rights©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.-
dc.subjectCognition-
dc.subjectVisualization-
dc.subjectSemantics-
dc.subjectLinguistics-
dc.subjectGrounding-
dc.titleGraph-Structured Referring Expression Reasoning in The Wild-
dc.typeConference_Paper-
dc.identifier.emailYu, Y: yzyu@cs.hku.hk-
dc.identifier.authorityYu, Y=rp01415-
dc.description.naturepostprint-
dc.identifier.doi10.1109/CVPR42600.2020.00997-
dc.identifier.scopuseid_2-s2.0-85094318254-
dc.identifier.hkuros310942-
dc.identifier.spage9949-
dc.identifier.epage9958-
dc.publisher.placeUnited States-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats