Polygon-free: Unconstrained Scene Text Detection with Box Annotations

Wu, W; Xie, E; Zhang, R; Wang, W; Luo, P; Zhou, H

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.48550/arXiv.2011.13307

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Polygon-free: Unconstrained Scene Text Detection with Box Annotations

Title	Polygon-free: Unconstrained Scene Text Detection with Box Annotations
Authors	Wu, W Xie, E Zhang, R Wang, W Luo, P Zhou, H
Issue Date	2022
Publisher	IEEE.
Citation	29th IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16-19 October, 2022. In Proceedings of IEEE International Conference on Image Processing (ICIP) 2022 How to Cite? DOI: http://dx.doi.org/10.48550/arXiv.2011.13307
Abstract	Although a polygon is a more accurate representation than an upright bounding box for text detection, the annotations of polygons are extremely expensive and challenging. Unlike existing works that employ fully-supervised training with polygon annotations, this study proposes an unconstrained text detection system termed Polygon-free (PF), in which most existing polygon-based text detectors ( e.g., PSENet [33],DB [16]) are trained with only upright bounding box annotations. Our core idea is to transfer knowledge from synthetic data to real data to enhance the supervision information of upright bounding boxes. This is made pos-sible with a simple segmentation network, namely Skeleton Attention Segmentation Network (SASN), that includes three vital components ( i.e., channel attention, spatial attention and skeleton attention map) and one soft cross-entropy loss. Experiments demonstrate that the proposed Polygon-free system can combine general detectors ( e.g., EAST, PSENet, DB) to yield surprisingly high-quality pixel-level results with only upright bounding box annotations on a variety of datasets ( e.g., ICDAR2019-Art, TotalText, IC-DAR2015). For example, without using polygon annotations, PSENet achieves an 80.5% F-score on TotalText [3] (vs. 80.9% of fully supervised counterpart), 31.1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs. We hope that PF can provide a new perspective for text detection to reduce the labeling costs.
Persistent Identifier	http://hdl.handle.net/10722/315807

DC Field	Value	Language
dc.contributor.author	Wu, W	-
dc.contributor.author	Xie, E	-
dc.contributor.author	Zhang, R	-
dc.contributor.author	Wang, W	-
dc.contributor.author	Luo, P	-
dc.contributor.author	Zhou, H	-
dc.date.accessioned	2022-08-19T09:04:48Z	-
dc.date.available	2022-08-19T09:04:48Z	-
dc.date.issued	2022	-
dc.identifier.citation	29th IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16-19 October, 2022. In Proceedings of IEEE International Conference on Image Processing (ICIP) 2022	-
dc.identifier.uri	http://hdl.handle.net/10722/315807	-
dc.description.abstract	Although a polygon is a more accurate representation than an upright bounding box for text detection, the annotations of polygons are extremely expensive and challenging. Unlike existing works that employ fully-supervised training with polygon annotations, this study proposes an unconstrained text detection system termed Polygon-free (PF), in which most existing polygon-based text detectors ( e.g., PSENet [33],DB [16]) are trained with only upright bounding box annotations. Our core idea is to transfer knowledge from synthetic data to real data to enhance the supervision information of upright bounding boxes. This is made pos-sible with a simple segmentation network, namely Skeleton Attention Segmentation Network (SASN), that includes three vital components ( i.e., channel attention, spatial attention and skeleton attention map) and one soft cross-entropy loss. Experiments demonstrate that the proposed Polygon-free system can combine general detectors ( e.g., EAST, PSENet, DB) to yield surprisingly high-quality pixel-level results with only upright bounding box annotations on a variety of datasets ( e.g., ICDAR2019-Art, TotalText, IC-DAR2015). For example, without using polygon annotations, PSENet achieves an 80.5% F-score on TotalText [3] (vs. 80.9% of fully supervised counterpart), 31.1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs. We hope that PF can provide a new perspective for text detection to reduce the labeling costs.	-
dc.language	eng	-
dc.publisher	IEEE.	-
dc.relation.ispartof	Proceedings of IEEE International Conference on Image Processing (ICIP) 2022	-
dc.rights	Proceedings of IEEE International Conference on Image Processing (ICIP). Copyright © IEEE.	-
dc.title	Polygon-free: Unconstrained Scene Text Detection with Box Annotations	-
dc.type	Conference_Paper	-
dc.identifier.email	Luo, P: pluo@hku.hk	-
dc.identifier.authority	Luo, P=rp02575	-
dc.identifier.doi	10.48550/arXiv.2011.13307	-
dc.identifier.hkuros	335610	-
dc.publisher.place	United States	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Polygon-free: Unconstrained Scene Text Detection with Box Annotations

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats