File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Robust text recognition in natural images
Title | Robust text recognition in natural images |
---|---|
Authors | |
Advisors | Advisor(s):Wong, KKY |
Issue Date | 2019 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Liu, W. [劉偉]. (2019). Robust text recognition in natural images. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | This thesis addresses the problem of scene text recognition, which refers to recognising words that appear in various kinds of natural images. It has received much attention as many real world applications can benefit from the rich semantic information embedded in natural text images. However, recognising text in natural images is not a trivial task due to many challenges. In this thesis, to achieve the goal of robust text recognition, we mainly focus on handling two of those challenges: having words with geometrical distortions and characters with varying scales in natural text images.
In the first chapter of this thesis, we present a novel SpaTial Attention Residue Network (STAR-Net) for recognising distorted scene text. To handle geometrical distortions of text images from the whole word perspective, our STAR-Net takes advantage of a global spatial transformer network, which can automatically locate and transform the entire distorted word region into an undistorted one. Residue convolutional blocks are also exploited in our STAR-Net to build a very deep feature encoder for extracting discriminative features from the rectified word region. Experimental results demonstrate our STAR-Net can successfully recognise distorted text in natural images and achieve better performance than previous methods on several public benchmarks.
Instead of focusing on the distortion of the entire word, this thesis then presents a character aware neural network (Char-Net), which tackles the distortion problem by detecting and rectifying individual characters in distorted text images. In order to recurrently attend on each character region in the text image, we employ a novel recurrent RoIwarp layer in our Char-Net. A simple spatial transformer network then takes the attended character region as the input and removes its local distortion. This approach of using a simple local transformation to remove the distortions of individual characters not only results in an improved efficiency, but can also handle different types of distortions that are hard, if not impossible, to be modelled by a single global transformation.
In the third part of this thesis, we address the scale problem for scene text recognition. In order to extract scale invariant features from characters with different scales, we specifically design a novel scale aware feature encoder. Compared with the traditional single-CNN encoder, our scale aware feature encoder explicitly handles the scale problem, which enables the recogniser put more effort in handling other challenges. Besides, our proposed encoder can transfer the learning of feature encoding across different character scales. This is particularly important when the training dataset has a very unbalanced distribution of character scales, as training with such a dataset makes the encoder biased towards extracting features from the predominant scale. Finally, we present a scale aware Char-Net that combines the scale aware feature encoder with our Char-Net to simultaneously handle characters with varying scales and words with severe distortions in natural images. |
Degree | Doctor of Philosophy |
Subject | Pattern recognition systems |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/273755 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Wong, KKY | - |
dc.contributor.author | Liu, Wei | - |
dc.contributor.author | 劉偉 | - |
dc.date.accessioned | 2019-08-14T03:29:46Z | - |
dc.date.available | 2019-08-14T03:29:46Z | - |
dc.date.issued | 2019 | - |
dc.identifier.citation | Liu, W. [劉偉]. (2019). Robust text recognition in natural images. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/273755 | - |
dc.description.abstract | This thesis addresses the problem of scene text recognition, which refers to recognising words that appear in various kinds of natural images. It has received much attention as many real world applications can benefit from the rich semantic information embedded in natural text images. However, recognising text in natural images is not a trivial task due to many challenges. In this thesis, to achieve the goal of robust text recognition, we mainly focus on handling two of those challenges: having words with geometrical distortions and characters with varying scales in natural text images. In the first chapter of this thesis, we present a novel SpaTial Attention Residue Network (STAR-Net) for recognising distorted scene text. To handle geometrical distortions of text images from the whole word perspective, our STAR-Net takes advantage of a global spatial transformer network, which can automatically locate and transform the entire distorted word region into an undistorted one. Residue convolutional blocks are also exploited in our STAR-Net to build a very deep feature encoder for extracting discriminative features from the rectified word region. Experimental results demonstrate our STAR-Net can successfully recognise distorted text in natural images and achieve better performance than previous methods on several public benchmarks. Instead of focusing on the distortion of the entire word, this thesis then presents a character aware neural network (Char-Net), which tackles the distortion problem by detecting and rectifying individual characters in distorted text images. In order to recurrently attend on each character region in the text image, we employ a novel recurrent RoIwarp layer in our Char-Net. A simple spatial transformer network then takes the attended character region as the input and removes its local distortion. This approach of using a simple local transformation to remove the distortions of individual characters not only results in an improved efficiency, but can also handle different types of distortions that are hard, if not impossible, to be modelled by a single global transformation. In the third part of this thesis, we address the scale problem for scene text recognition. In order to extract scale invariant features from characters with different scales, we specifically design a novel scale aware feature encoder. Compared with the traditional single-CNN encoder, our scale aware feature encoder explicitly handles the scale problem, which enables the recogniser put more effort in handling other challenges. Besides, our proposed encoder can transfer the learning of feature encoding across different character scales. This is particularly important when the training dataset has a very unbalanced distribution of character scales, as training with such a dataset makes the encoder biased towards extracting features from the predominant scale. Finally, we present a scale aware Char-Net that combines the scale aware feature encoder with our Char-Net to simultaneously handle characters with varying scales and words with severe distortions in natural images. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Pattern recognition systems | - |
dc.title | Robust text recognition in natural images | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_991044128172303414 | - |
dc.date.hkucongregation | 2019 | - |
dc.identifier.mmsid | 991044128172303414 | - |