File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Ergodic multigram HMM integrating word segmentation and classtagging for Chinese language modeling

TitleErgodic multigram HMM integrating word segmentation and classtagging for Chinese language modeling
Authors
KeywordsEngineering
Electrical engineering
Issue Date1996
PublisherIEEE.
Citation
IEEE International Conference on Acoustics, Speech and Signal Processing Proceedings, Atlanta, GA, USA, 7-10 May 1996, v. 1, p. 196-199 How to Cite?
AbstractA novel ergodic multigram hidden Markov model (HMM) is introduced which models sentence production as a doubly stochastic process, in which word classes are first produced according to a first order Markov model, and then single or multi-character words are generated independently based on the word classes, without word boundary marked on the sentence. This model can be applied to languages without word boundary markers such as Chinese. With a lexicon containing syntactic classes for each word, its applications include language modeling for recognizers, and integrated word segmentation and class tagging. Pre-segmented and tagged corpus are not needed for training, and both segmentation and tagging are trained in one single model. In this paper, relevant algorithms for this model are presented, and experimental results on a Chinese news corpus are reported.
Persistent Identifierhttp://hdl.handle.net/10722/45536
ISSN

 

DC FieldValueLanguage
dc.contributor.authorLaw, HHCen_HK
dc.contributor.authorChan, Cen_HK
dc.date.accessioned2007-10-30T06:28:42Z-
dc.date.available2007-10-30T06:28:42Z-
dc.date.issued1996en_HK
dc.identifier.citationIEEE International Conference on Acoustics, Speech and Signal Processing Proceedings, Atlanta, GA, USA, 7-10 May 1996, v. 1, p. 196-199en_HK
dc.identifier.issn1520-6149en_HK
dc.identifier.urihttp://hdl.handle.net/10722/45536-
dc.description.abstractA novel ergodic multigram hidden Markov model (HMM) is introduced which models sentence production as a doubly stochastic process, in which word classes are first produced according to a first order Markov model, and then single or multi-character words are generated independently based on the word classes, without word boundary marked on the sentence. This model can be applied to languages without word boundary markers such as Chinese. With a lexicon containing syntactic classes for each word, its applications include language modeling for recognizers, and integrated word segmentation and class tagging. Pre-segmented and tagged corpus are not needed for training, and both segmentation and tagging are trained in one single model. In this paper, relevant algorithms for this model are presented, and experimental results on a Chinese news corpus are reported.en_HK
dc.format.extent396352 bytes-
dc.format.extent3669 bytes-
dc.format.mimetypeapplication/pdf-
dc.format.mimetypetext/plain-
dc.languageengen_HK
dc.publisherIEEE.en_HK
dc.rights©1996 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.-
dc.subjectEngineeringen_HK
dc.subjectElectrical engineeringen_HK
dc.titleErgodic multigram HMM integrating word segmentation and classtagging for Chinese language modelingen_HK
dc.typeConference_Paperen_HK
dc.identifier.openurlhttp://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1520-6149&volume=1&spage=196&epage=199&date=1996&atitle=Ergodic+multigram+HMM+integrating+word+segmentation+and+classtagging+for+Chinese+language+modelingen_HK
dc.description.naturepublished_or_final_versionen_HK
dc.identifier.doi10.1109/ICASSP.1996.540324en_HK
dc.identifier.hkuros10594-
dc.identifier.issnl1520-6149-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats