Robust text recognition in natural images

Liu, Wei; 劉偉

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_991044128172303414

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Robust text recognition in natural images

Title	Robust text recognition in natural images
Authors	Liu, Wei 劉偉
Advisors	Advisor(s):Wong, KKY
Issue Date	2019
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Liu, W. [劉偉]. (2019). Robust text recognition in natural images. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	This thesis addresses the problem of scene text recognition, which refers to recognising words that appear in various kinds of natural images. It has received much attention as many real world applications can benefit from the rich semantic information embedded in natural text images. However, recognising text in natural images is not a trivial task due to many challenges. In this thesis, to achieve the goal of robust text recognition, we mainly focus on handling two of those challenges: having words with geometrical distortions and characters with varying scales in natural text images. In the first chapter of this thesis, we present a novel SpaTial Attention Residue Network (STAR-Net) for recognising distorted scene text. To handle geometrical distortions of text images from the whole word perspective, our STAR-Net takes advantage of a global spatial transformer network, which can automatically locate and transform the entire distorted word region into an undistorted one. Residue convolutional blocks are also exploited in our STAR-Net to build a very deep feature encoder for extracting discriminative features from the rectified word region. Experimental results demonstrate our STAR-Net can successfully recognise distorted text in natural images and achieve better performance than previous methods on several public benchmarks. Instead of focusing on the distortion of the entire word, this thesis then presents a character aware neural network (Char-Net), which tackles the distortion problem by detecting and rectifying individual characters in distorted text images. In order to recurrently attend on each character region in the text image, we employ a novel recurrent RoIwarp layer in our Char-Net. A simple spatial transformer network then takes the attended character region as the input and removes its local distortion. This approach of using a simple local transformation to remove the distortions of individual characters not only results in an improved efficiency, but can also handle different types of distortions that are hard, if not impossible, to be modelled by a single global transformation. In the third part of this thesis, we address the scale problem for scene text recognition. In order to extract scale invariant features from characters with different scales, we specifically design a novel scale aware feature encoder. Compared with the traditional single-CNN encoder, our scale aware feature encoder explicitly handles the scale problem, which enables the recogniser put more effort in handling other challenges. Besides, our proposed encoder can transfer the learning of feature encoding across different character scales. This is particularly important when the training dataset has a very unbalanced distribution of character scales, as training with such a dataset makes the encoder biased towards extracting features from the predominant scale. Finally, we present a scale aware Char-Net that combines the scale aware feature encoder with our Char-Net to simultaneously handle characters with varying scales and words with severe distortions in natural images.
Degree	Doctor of Philosophy
Subject	Pattern recognition systems
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/273755

DC Field	Value	Language
dc.contributor.advisor	Wong, KKY	-
dc.contributor.author	Liu, Wei	-
dc.contributor.author	劉偉	-
dc.date.accessioned	2019-08-14T03:29:46Z	-
dc.date.available	2019-08-14T03:29:46Z	-
dc.date.issued	2019	-
dc.identifier.citation	Liu, W. [劉偉]. (2019). Robust text recognition in natural images. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/273755	-
dc.description.abstract	This thesis addresses the problem of scene text recognition, which refers to recognising words that appear in various kinds of natural images. It has received much attention as many real world applications can benefit from the rich semantic information embedded in natural text images. However, recognising text in natural images is not a trivial task due to many challenges. In this thesis, to achieve the goal of robust text recognition, we mainly focus on handling two of those challenges: having words with geometrical distortions and characters with varying scales in natural text images. In the first chapter of this thesis, we present a novel SpaTial Attention Residue Network (STAR-Net) for recognising distorted scene text. To handle geometrical distortions of text images from the whole word perspective, our STAR-Net takes advantage of a global spatial transformer network, which can automatically locate and transform the entire distorted word region into an undistorted one. Residue convolutional blocks are also exploited in our STAR-Net to build a very deep feature encoder for extracting discriminative features from the rectified word region. Experimental results demonstrate our STAR-Net can successfully recognise distorted text in natural images and achieve better performance than previous methods on several public benchmarks. Instead of focusing on the distortion of the entire word, this thesis then presents a character aware neural network (Char-Net), which tackles the distortion problem by detecting and rectifying individual characters in distorted text images. In order to recurrently attend on each character region in the text image, we employ a novel recurrent RoIwarp layer in our Char-Net. A simple spatial transformer network then takes the attended character region as the input and removes its local distortion. This approach of using a simple local transformation to remove the distortions of individual characters not only results in an improved efficiency, but can also handle different types of distortions that are hard, if not impossible, to be modelled by a single global transformation. In the third part of this thesis, we address the scale problem for scene text recognition. In order to extract scale invariant features from characters with different scales, we specifically design a novel scale aware feature encoder. Compared with the traditional single-CNN encoder, our scale aware feature encoder explicitly handles the scale problem, which enables the recogniser put more effort in handling other challenges. Besides, our proposed encoder can transfer the learning of feature encoding across different character scales. This is particularly important when the training dataset has a very unbalanced distribution of character scales, as training with such a dataset makes the encoder biased towards extracting features from the predominant scale. Finally, we present a scale aware Char-Net that combines the scale aware feature encoder with our Char-Net to simultaneously handle characters with varying scales and words with severe distortions in natural images.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Pattern recognition systems	-
dc.title	Robust text recognition in natural images	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_991044128172303414	-
dc.date.hkucongregation	2019	-
dc.identifier.mmsid	991044128172303414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Robust text recognition in natural images

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats