File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/CVPR42600.2020.00997
- Scopus: eid_2-s2.0-85094318254
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Graph-Structured Referring Expression Reasoning in The Wild
Title | Graph-Structured Referring Expression Reasoning in The Wild |
---|---|
Authors | |
Keywords | Cognition Visualization Semantics Linguistics Grounding |
Issue Date | 2020 |
Publisher | IEEE Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000147 |
Citation | Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 14-19 June 2020, p. 9949-9958 How to Cite? |
Abstract | Grounding referring expressions aims to locate in an image an object referred to by a natural language expression. The linguistic structure of a referring expression provides a layout of reasoning over the visual contents, and it is often crucial to align and jointly understand the image and the referring expression. In this paper, we propose a scene graph guided modular network (SGMN), which performs reasoning over a semantic graph and a scene graph with neural modules under the guidance of the linguistic structure of the expression. In particular, we model the image as a structured semantic graph, and parse the expression into a language scene graph. The language scene graph not only decodes the linguistic structure of the expression, but also has a consistent representation with the image semantic graph. In addition to exploring structured solutions to grounding referring expressions, we also propose Ref-Reasoning, a large-scale real-world dataset for structured referring expression reasoning. We automatically generate referring expressions over the scene graphs of images using diverse expression templates and functional programs. This dataset is equipped with real-world visual contents as well as semantically rich expressions with different reasoning layouts. Experimental results show that our SGMN not only significantly outperforms existing state-of-the-art algorithms on the new Ref-Reasoning dataset, but also surpasses state-of-the-art structured methods on commonly used benchmark datasets. It can also provide interpretable visual evidences of reasoning. |
Description | Session: Oral 3.1C — Vision & Language ; Poster No. 26 - Paper ID 2703 CVPR 2020 took place virtually due to COVID-19 |
Persistent Identifier | http://hdl.handle.net/10722/284144 |
ISSN | 2023 SCImago Journal Rankings: 10.331 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yang, S | - |
dc.contributor.author | Li, G | - |
dc.contributor.author | Yu, Y | - |
dc.date.accessioned | 2020-07-20T05:56:26Z | - |
dc.date.available | 2020-07-20T05:56:26Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 14-19 June 2020, p. 9949-9958 | - |
dc.identifier.issn | 1063-6919 | - |
dc.identifier.uri | http://hdl.handle.net/10722/284144 | - |
dc.description | Session: Oral 3.1C — Vision & Language ; Poster No. 26 - Paper ID 2703 | - |
dc.description | CVPR 2020 took place virtually due to COVID-19 | - |
dc.description.abstract | Grounding referring expressions aims to locate in an image an object referred to by a natural language expression. The linguistic structure of a referring expression provides a layout of reasoning over the visual contents, and it is often crucial to align and jointly understand the image and the referring expression. In this paper, we propose a scene graph guided modular network (SGMN), which performs reasoning over a semantic graph and a scene graph with neural modules under the guidance of the linguistic structure of the expression. In particular, we model the image as a structured semantic graph, and parse the expression into a language scene graph. The language scene graph not only decodes the linguistic structure of the expression, but also has a consistent representation with the image semantic graph. In addition to exploring structured solutions to grounding referring expressions, we also propose Ref-Reasoning, a large-scale real-world dataset for structured referring expression reasoning. We automatically generate referring expressions over the scene graphs of images using diverse expression templates and functional programs. This dataset is equipped with real-world visual contents as well as semantically rich expressions with different reasoning layouts. Experimental results show that our SGMN not only significantly outperforms existing state-of-the-art algorithms on the new Ref-Reasoning dataset, but also surpasses state-of-the-art structured methods on commonly used benchmark datasets. It can also provide interpretable visual evidences of reasoning. | - |
dc.language | eng | - |
dc.publisher | IEEE Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000147 | - |
dc.relation.ispartof | IEEE Conference on Computer Vision and Pattern Recognition. Proceedings | - |
dc.rights | IEEE Conference on Computer Vision and Pattern Recognition. Proceedings. Copyright © IEEE Computer Society. | - |
dc.rights | ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | - |
dc.subject | Cognition | - |
dc.subject | Visualization | - |
dc.subject | Semantics | - |
dc.subject | Linguistics | - |
dc.subject | Grounding | - |
dc.title | Graph-Structured Referring Expression Reasoning in The Wild | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Yu, Y: yzyu@cs.hku.hk | - |
dc.identifier.authority | Yu, Y=rp01415 | - |
dc.description.nature | postprint | - |
dc.identifier.doi | 10.1109/CVPR42600.2020.00997 | - |
dc.identifier.scopus | eid_2-s2.0-85094318254 | - |
dc.identifier.hkuros | 310942 | - |
dc.identifier.spage | 9949 | - |
dc.identifier.epage | 9958 | - |
dc.publisher.place | United States | - |