File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: A study on word-based and integral-bit chinese text compression algorithms

TitleA study on word-based and integral-bit chinese text compression algorithms
Authors
Issue Date1999
PublisherJohn Wiley & Sons, Inc. The Journal's web site is located at http://www.asis.org/Publications/JASIS/jasis.html
Citation
Journal Of The American Society For Information Science, 1999, v. 50 n. 2-3, p. 218-228 How to Cite?
AbstractExperimental results show that a word-based arithmetic coding scheme can achieve a higher compression performance for Chinese text. However, an arithmetic coding scheme is a fractional-bit compression algorithm which is known to be time consuming. In this article, we change the direction to study how to cascade the word segmentation model with a faster alternative, the integral-bit compression algorithm. It is shown that the cascaded algorithm is more suitable for practical usage. Among several word-based integral-bit compression algorithms, WLZSSHUF achieves the best compression results. Not only can it achieve a comparable compression ratio with a PPM compressor, COMP-2, it demonstrates a faster compression and decompression speed. In the last part of this article, the relation between the accuracy of the word segmentation model (match ratio) and the performance of the compression algorithm (compression ratio) are analyzed. By varying the match ratio, it was discovered that the growth rate of the compression ratio is content-dependent and close to linear. The results of our study will help the practitioners of information retrieval to design word-based compression algorithms for Chinese. This is particularly useful to multilingual digital libraries in which a massive volume of data is often involved. © 1999 John Wiley & Sons, Inc.
Persistent Identifierhttp://hdl.handle.net/10722/174781
ISSN
References

 

DC FieldValueLanguage
dc.contributor.authorCheng, KSen_US
dc.date.accessioned2012-11-26T08:47:24Z-
dc.date.available2012-11-26T08:47:24Z-
dc.date.issued1999en_US
dc.identifier.citationJournal Of The American Society For Information Science, 1999, v. 50 n. 2-3, p. 218-228en_US
dc.identifier.issn0002-8231en_US
dc.identifier.urihttp://hdl.handle.net/10722/174781-
dc.description.abstractExperimental results show that a word-based arithmetic coding scheme can achieve a higher compression performance for Chinese text. However, an arithmetic coding scheme is a fractional-bit compression algorithm which is known to be time consuming. In this article, we change the direction to study how to cascade the word segmentation model with a faster alternative, the integral-bit compression algorithm. It is shown that the cascaded algorithm is more suitable for practical usage. Among several word-based integral-bit compression algorithms, WLZSSHUF achieves the best compression results. Not only can it achieve a comparable compression ratio with a PPM compressor, COMP-2, it demonstrates a faster compression and decompression speed. In the last part of this article, the relation between the accuracy of the word segmentation model (match ratio) and the performance of the compression algorithm (compression ratio) are analyzed. By varying the match ratio, it was discovered that the growth rate of the compression ratio is content-dependent and close to linear. The results of our study will help the practitioners of information retrieval to design word-based compression algorithms for Chinese. This is particularly useful to multilingual digital libraries in which a massive volume of data is often involved. © 1999 John Wiley & Sons, Inc.en_US
dc.languageengen_US
dc.publisherJohn Wiley & Sons, Inc. The Journal's web site is located at http://www.asis.org/Publications/JASIS/jasis.htmlen_US
dc.relation.ispartofJournal of the American Society for Information Scienceen_US
dc.titleA study on word-based and integral-bit chinese text compression algorithmsen_US
dc.typeArticleen_US
dc.identifier.emailCheng, KS: hrspksc@hkucc.hku.hken_US
dc.identifier.authorityCheng, KS=rp00675en_US
dc.description.naturelink_to_subscribed_fulltexten_US
dc.identifier.scopuseid_2-s2.0-0033101615en_US
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-0033101615&selection=ref&src=s&origin=recordpageen_US
dc.identifier.volume50en_US
dc.identifier.issue2-3en_US
dc.identifier.spage218en_US
dc.identifier.epage228en_US
dc.publisher.placeUnited Statesen_US
dc.identifier.scopusauthoridCheng, KS=9745798500en_US
dc.identifier.issnl0002-8231-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats