File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Improving referring expression grounding with cross-modal attention-guided erasing

TitleImproving referring expression grounding with cross-modal attention-guided erasing
Authors
KeywordsCategorization
Recognition: Detection
Retrieval
Vision + Language
Issue Date2019
Citation
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019, v. 2019-June, p. 1950-1959 How to Cite?
AbstractReferring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions. Although the attention mechanism has been successfully applied for cross-modal alignments, previous attention models focus on only the most dominant features of both modalities, and neglect the fact that there could be multiple comprehensive textual-visual correspondences between images and referring expressions. To tackle this issue, we design a novel cross-modal attention-guided erasing approach, where we discard the most dominant information from either textual or visual domains to generate difficult training samples online, and to drive the model to discover complementary textual-visual correspondences. Extensive experiments demonstrate the effectiveness of our proposed method, which achieves state-of-the-art performance on three referring expression grounding datasets.
Persistent Identifierhttp://hdl.handle.net/10722/316529
ISSN
2020 SCImago Journal Rankings: 4.658
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorLiu, Xihui-
dc.contributor.authorWang, Zihao-
dc.contributor.authorShao, Jing-
dc.contributor.authorWang, Xiaogang-
dc.contributor.authorLi, Hongsheng-
dc.date.accessioned2022-09-14T11:40:41Z-
dc.date.available2022-09-14T11:40:41Z-
dc.date.issued2019-
dc.identifier.citationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019, v. 2019-June, p. 1950-1959-
dc.identifier.issn1063-6919-
dc.identifier.urihttp://hdl.handle.net/10722/316529-
dc.description.abstractReferring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions. Although the attention mechanism has been successfully applied for cross-modal alignments, previous attention models focus on only the most dominant features of both modalities, and neglect the fact that there could be multiple comprehensive textual-visual correspondences between images and referring expressions. To tackle this issue, we design a novel cross-modal attention-guided erasing approach, where we discard the most dominant information from either textual or visual domains to generate difficult training samples online, and to drive the model to discover complementary textual-visual correspondences. Extensive experiments demonstrate the effectiveness of our proposed method, which achieves state-of-the-art performance on three referring expression grounding datasets.-
dc.languageeng-
dc.relation.ispartofProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition-
dc.subjectCategorization-
dc.subjectRecognition: Detection-
dc.subjectRetrieval-
dc.subjectVision + Language-
dc.titleImproving referring expression grounding with cross-modal attention-guided erasing-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/CVPR.2019.00205-
dc.identifier.scopuseid_2-s2.0-85074842634-
dc.identifier.volume2019-June-
dc.identifier.spage1950-
dc.identifier.epage1959-
dc.identifier.isiWOS:000529484002012-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats