File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: End-to-End Video Text Spotting with Transformer

TitleEnd-to-End Video Text Spotting with Transformer
Authors
Issue Date2022
PublisherOrtra Ltd.
Citation
European Conference on Computer Vision (Hybrid), Tel Aviv, Israel, October 23-27, 2022. In Proceedings of the European Conference on Computer Vision (ECCV), 2022 How to Cite?
AbstractRecent video text spotting methods usually require the three-staged pipeline, i.e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results. These methods typically follow the tracking-by-match paradigm and develop sophisticated pipelines. In this paper, rooted in Transformer sequence modeling, we propose a simple, but effective end-to-end video text DEtection, Tracking, and Recognition framework (TransDETR). TransDETR mainly includes two advantages: 1) Different from the explicit match paradigm in the adjacent frame, TransDETR tracks and recognizes each text implicitly by the different query termed text query over long-range temporal sequence (more than 7 frames). 2) TransDETR is the first end-to-end trainable video text spotting framework, which simultaneously addresses the three sub-tasks (e.g., text detection, tracking, recognition). Extensive experiments in four video text datasets (i.e.,ICDAR2013 Video, ICDAR2015 Video, Minetto, and YouTube Video Text) are conducted to demonstrate that TransDETR achieves state-of-the-art performance with up to around 8.0% improvements on video text spotting tasks.
Persistent Identifierhttp://hdl.handle.net/10722/315806

 

DC FieldValueLanguage
dc.contributor.authorWu, W-
dc.contributor.authorCai, Y-
dc.contributor.authorShen, C-
dc.contributor.authorZhang, D-
dc.contributor.authorFu, Y-
dc.contributor.authorZhou, H-
dc.contributor.authorLuo, P-
dc.date.accessioned2022-08-19T09:04:47Z-
dc.date.available2022-08-19T09:04:47Z-
dc.date.issued2022-
dc.identifier.citationEuropean Conference on Computer Vision (Hybrid), Tel Aviv, Israel, October 23-27, 2022. In Proceedings of the European Conference on Computer Vision (ECCV), 2022-
dc.identifier.urihttp://hdl.handle.net/10722/315806-
dc.description.abstractRecent video text spotting methods usually require the three-staged pipeline, i.e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results. These methods typically follow the tracking-by-match paradigm and develop sophisticated pipelines. In this paper, rooted in Transformer sequence modeling, we propose a simple, but effective end-to-end video text DEtection, Tracking, and Recognition framework (TransDETR). TransDETR mainly includes two advantages: 1) Different from the explicit match paradigm in the adjacent frame, TransDETR tracks and recognizes each text implicitly by the different query termed text query over long-range temporal sequence (more than 7 frames). 2) TransDETR is the first end-to-end trainable video text spotting framework, which simultaneously addresses the three sub-tasks (e.g., text detection, tracking, recognition). Extensive experiments in four video text datasets (i.e.,ICDAR2013 Video, ICDAR2015 Video, Minetto, and YouTube Video Text) are conducted to demonstrate that TransDETR achieves state-of-the-art performance with up to around 8.0% improvements on video text spotting tasks.-
dc.languageeng-
dc.publisherOrtra Ltd.-
dc.relation.ispartofProceedings of the European Conference on Computer Vision (ECCV), 2022-
dc.titleEnd-to-End Video Text Spotting with Transformer-
dc.typeConference_Paper-
dc.identifier.emailLuo, P: pluo@hku.hk-
dc.identifier.authorityLuo, P=rp02575-
dc.identifier.doi10.48550/arXiv.2203.10539-
dc.identifier.hkuros335609-
dc.publisher.placeIsrael-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats