Graph-Structured Referring Expression Reasoning in The Wild

Yang, S; Li, G; Yu, Y

File Download

Content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/CVPR42600.2020.00997
Scopus: eid_2-s2.0-85094318254
WOS: WOS:001309199902082
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Graph-Structured Referring Expression Reasoning in The Wild

Title	Graph-Structured Referring Expression Reasoning in The Wild
Authors	Yang, S Li, G Yu, Y
Keywords	Cognition Visualization Semantics Linguistics Grounding
Issue Date	2020
Publisher	IEEE Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000147
Citation	Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 14-19 June 2020, p. 9949-9958 How to Cite? DOI: http://dx.doi.org/10.1109/CVPR42600.2020.00997
Abstract	Grounding referring expressions aims to locate in an image an object referred to by a natural language expression. The linguistic structure of a referring expression provides a layout of reasoning over the visual contents, and it is often crucial to align and jointly understand the image and the referring expression. In this paper, we propose a scene graph guided modular network (SGMN), which performs reasoning over a semantic graph and a scene graph with neural modules under the guidance of the linguistic structure of the expression. In particular, we model the image as a structured semantic graph, and parse the expression into a language scene graph. The language scene graph not only decodes the linguistic structure of the expression, but also has a consistent representation with the image semantic graph. In addition to exploring structured solutions to grounding referring expressions, we also propose Ref-Reasoning, a large-scale real-world dataset for structured referring expression reasoning. We automatically generate referring expressions over the scene graphs of images using diverse expression templates and functional programs. This dataset is equipped with real-world visual contents as well as semantically rich expressions with different reasoning layouts. Experimental results show that our SGMN not only significantly outperforms existing state-of-the-art algorithms on the new Ref-Reasoning dataset, but also surpasses state-of-the-art structured methods on commonly used benchmark datasets. It can also provide interpretable visual evidences of reasoning.
Description	Session: Oral 3.1C — Vision & Language ; Poster No. 26 - Paper ID 2703 CVPR 2020 took place virtually due to COVID-19
Persistent Identifier	http://hdl.handle.net/10722/284144
ISSN	1063-6919 2023 SCImago Journal Rankings: 10.331
ISI Accession Number ID	WOS:001309199902082

DC Field	Value	Language
dc.contributor.author	Yang, S	-
dc.contributor.author	Li, G	-
dc.contributor.author	Yu, Y	-
dc.date.accessioned	2020-07-20T05:56:26Z	-
dc.date.available	2020-07-20T05:56:26Z	-
dc.date.issued	2020	-
dc.identifier.citation	Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 14-19 June 2020, p. 9949-9958	-
dc.identifier.issn	1063-6919	-
dc.identifier.uri	http://hdl.handle.net/10722/284144	-
dc.description	Session: Oral 3.1C — Vision & Language ; Poster No. 26 - Paper ID 2703	-
dc.description	CVPR 2020 took place virtually due to COVID-19	-
dc.description.abstract	Grounding referring expressions aims to locate in an image an object referred to by a natural language expression. The linguistic structure of a referring expression provides a layout of reasoning over the visual contents, and it is often crucial to align and jointly understand the image and the referring expression. In this paper, we propose a scene graph guided modular network (SGMN), which performs reasoning over a semantic graph and a scene graph with neural modules under the guidance of the linguistic structure of the expression. In particular, we model the image as a structured semantic graph, and parse the expression into a language scene graph. The language scene graph not only decodes the linguistic structure of the expression, but also has a consistent representation with the image semantic graph. In addition to exploring structured solutions to grounding referring expressions, we also propose Ref-Reasoning, a large-scale real-world dataset for structured referring expression reasoning. We automatically generate referring expressions over the scene graphs of images using diverse expression templates and functional programs. This dataset is equipped with real-world visual contents as well as semantically rich expressions with different reasoning layouts. Experimental results show that our SGMN not only significantly outperforms existing state-of-the-art algorithms on the new Ref-Reasoning dataset, but also surpasses state-of-the-art structured methods on commonly used benchmark datasets. It can also provide interpretable visual evidences of reasoning.	-
dc.language	eng	-
dc.publisher	IEEE Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000147	-
dc.relation.ispartof	IEEE Conference on Computer Vision and Pattern Recognition. Proceedings	-
dc.rights	IEEE Conference on Computer Vision and Pattern Recognition. Proceedings. Copyright © IEEE Computer Society.	-
dc.rights	©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	-
dc.subject	Cognition	-
dc.subject	Visualization	-
dc.subject	Semantics	-
dc.subject	Linguistics	-
dc.subject	Grounding	-
dc.title	Graph-Structured Referring Expression Reasoning in The Wild	-
dc.type	Conference_Paper	-
dc.identifier.email	Yu, Y: yzyu@cs.hku.hk	-
dc.identifier.authority	Yu, Y=rp01415	-
dc.description.nature	postprint	-
dc.identifier.doi	10.1109/CVPR42600.2020.00997	-
dc.identifier.scopus	eid_2-s2.0-85094318254	-
dc.identifier.hkuros	310942	-
dc.identifier.spage	9949	-
dc.identifier.epage	9958	-
dc.identifier.isi	WOS:001309199902082	-
dc.publisher.place	United States	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Graph-Structured Referring Expression Reasoning in The Wild

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats