CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection

Ma, Chuofan; Jiang, Yi; Wen, Xin; Yuan, Zehuan; Qi, Xiaojuan

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Electrical & Electronic Engineering: Conference papers

Conference Paper: CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection

Title	CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Authors	Ma, Chuofan Jiang, Yi Wen, Xin Yuan, Zehuan Qi, Xiaojuan
Issue Date	10-Dec-2023
Abstract	Deriving reliable region-word alignment from image-text pairs is critical to learn object-level vision-language representations for open-vocabulary object detection. Existing methods typically rely on pre-trained or self-trained vision-language models for alignment, which are prone to limitations in localization accuracy or generalization capabilities. In this paper, we propose CoDet, a novel approach that overcomes the reliance on pre-aligned vision-language space by reformulating region-word alignment as a co-occurring object discovery problem. Intuitively, by grouping images that mention a shared concept in their captions, objects corresponding to the shared concept shall exhibit high co-occurrence among the group. CoDet then leverages visual similarities to discover the co-occurring objects and align them with the shared concept. Extensive experiments demonstrate that CoDet has superior performances and compelling scalability in open-vocabulary detection, e.g., by scaling up the visual backbone, CoDet achieves 37.0 and 44.7 on OV-LVIS, surpassing the previous SoTA by 4.2 and 9.8 . Code is available at https://github.com/CVMI-Lab/CoDet.
Persistent Identifier	http://hdl.handle.net/10722/340352

DC Field	Value	Language
dc.contributor.author	Ma, Chuofan	-
dc.contributor.author	Jiang, Yi	-
dc.contributor.author	Wen, Xin	-
dc.contributor.author	Yuan, Zehuan	-
dc.contributor.author	Qi, Xiaojuan	-
dc.date.accessioned	2024-03-11T10:43:31Z	-
dc.date.available	2024-03-11T10:43:31Z	-
dc.date.issued	2023-12-10	-
dc.identifier.uri	http://hdl.handle.net/10722/340352	-
dc.description.abstract	<p>Deriving reliable region-word alignment from image-text pairs is critical to learn object-level vision-language representations for open-vocabulary object detection. Existing methods typically rely on pre-trained or self-trained vision-language models for alignment, which are prone to limitations in localization accuracy or generalization capabilities. In this paper, we propose CoDet, a novel approach that overcomes the reliance on pre-aligned vision-language space by reformulating region-word alignment as a co-occurring object discovery problem. Intuitively, by grouping images that mention a shared concept in their captions, objects corresponding to the shared concept shall exhibit high co-occurrence among the group. CoDet then leverages visual similarities to discover the co-occurring objects and align them with the shared concept. Extensive experiments demonstrate that CoDet has superior performances and compelling scalability in open-vocabulary detection, e.g., by scaling up the visual backbone, CoDet achieves 37.0 and 44.7 on OV-LVIS, surpassing the previous SoTA by 4.2 and 9.8 . Code is available at https://github.com/CVMI-Lab/CoDet.</p>	-
dc.language	eng	-
dc.relation.ispartof	Neural Information Processing Systems 2023 (10/12/2023-16/12/2023, , , New Orleans)	-
dc.title	CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection	-
dc.type	Conference_Paper	-

File Download

Supplementary

Conference Paper: CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats