File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: High Performance Chinese OCR Based on Gabor Features, Discriminative Feature Extraction and Model Training

TitleHigh Performance Chinese OCR Based on Gabor Features, Discriminative Feature Extraction and Model Training
Authors
KeywordsEngineering
Electrical engineering
Issue Date2001
PublisherIEEE.
Citation
IEEE International Conference on Acoustics, Speech and Signal Processing Proceedings, Salt Lake City, UT, 7-11 May 2001, v. 3, p. 1517-1520 How to Cite?
AbstractWe have developed a Chinese OCR engine for machine printed documents. Currently, our OCR engine can support a vocabulary of 6921 characters which include 6707 simplified Chinese characters in GB2312-80, 12 frequently used GBK Chinese characters, 62 alphanumeric characters, 140 punctuation marks and symbols. The supported font styles include Song, Fang Song, Kat, He, Yuan, LiShu, WeiBei, XingKai, etc. The averaged character recognition accuracy is above 99% for newspaper quality documents with a recognition speed of about 250 characters per second on a Pentium III-450 MHz PC yet only consuming less than 2 MB memory. We describe the key technologies we used to construct the above recognizer. Among them, we highlight three key techniques contributing to the high recognition accuracy, namely the use of Gabor features, the use of discriminative feature extraction, and the use of minimum classification error as a criterion for model training.
Persistent Identifierhttp://hdl.handle.net/10722/45630
ISSN

 

DC FieldValueLanguage
dc.contributor.authorHuo, Qen_HK
dc.contributor.authorGe, Yen_HK
dc.contributor.authorFeng, ZDen_HK
dc.date.accessioned2007-10-30T06:30:40Z-
dc.date.available2007-10-30T06:30:40Z-
dc.date.issued2001en_HK
dc.identifier.citationIEEE International Conference on Acoustics, Speech and Signal Processing Proceedings, Salt Lake City, UT, 7-11 May 2001, v. 3, p. 1517-1520en_HK
dc.identifier.issn1520-6149en_HK
dc.identifier.urihttp://hdl.handle.net/10722/45630-
dc.description.abstractWe have developed a Chinese OCR engine for machine printed documents. Currently, our OCR engine can support a vocabulary of 6921 characters which include 6707 simplified Chinese characters in GB2312-80, 12 frequently used GBK Chinese characters, 62 alphanumeric characters, 140 punctuation marks and symbols. The supported font styles include Song, Fang Song, Kat, He, Yuan, LiShu, WeiBei, XingKai, etc. The averaged character recognition accuracy is above 99% for newspaper quality documents with a recognition speed of about 250 characters per second on a Pentium III-450 MHz PC yet only consuming less than 2 MB memory. We describe the key technologies we used to construct the above recognizer. Among them, we highlight three key techniques contributing to the high recognition accuracy, namely the use of Gabor features, the use of discriminative feature extraction, and the use of minimum classification error as a criterion for model training.en_HK
dc.format.extent431301 bytes-
dc.format.extent7254 bytes-
dc.format.mimetypeapplication/pdf-
dc.format.mimetypetext/plain-
dc.languageengen_HK
dc.publisherIEEE.en_HK
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.rights©2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.en_HK
dc.subjectEngineeringen_HK
dc.subjectElectrical engineeringen_HK
dc.titleHigh Performance Chinese OCR Based on Gabor Features, Discriminative Feature Extraction and Model Trainingen_HK
dc.typeConference_Paperen_HK
dc.identifier.openurlhttp://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1520-6149&volume=3&spage=1517&epage=1520&date=2001&atitle=High+Performance+Chinese+OCR+Based+on+Gabor+Features,+Discriminative+Feature+Extraction+and+Model+Trainingen_HK
dc.description.naturepublished_or_final_versionen_HK
dc.identifier.doi10.1109/ICASSP.2001.941220en_HK
dc.identifier.hkuros57656-
dc.identifier.citeulike7901185-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats