File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Chinese text chunking using lexicalized HMMS

TitleChinese text chunking using lexicalized HMMS
Authors
KeywordsBase phrase recognition
Base phrase structure
Lexicalized hidden markov models (HMMs)
Text chunking
Issue Date2005
PublisherIEEE.
Citation
2005 International Conference On Machine Learning And Cybernetics, Icmlc 2005, 2005, p. 7-12 How to Cite?
AbstractThis paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system. © 2005 IEEE.
Persistent Identifierhttp://hdl.handle.net/10722/54212
References

 

DC FieldValueLanguage
dc.contributor.authorFu, GHen_HK
dc.contributor.authorXu, RFen_HK
dc.contributor.authorLuke, KKen_HK
dc.contributor.authorLu, Qen_HK
dc.date.accessioned2009-04-03T07:39:49Z-
dc.date.available2009-04-03T07:39:49Z-
dc.date.issued2005en_HK
dc.identifier.citation2005 International Conference On Machine Learning And Cybernetics, Icmlc 2005, 2005, p. 7-12en_HK
dc.identifier.urihttp://hdl.handle.net/10722/54212-
dc.description.abstractThis paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system. © 2005 IEEE.en_HK
dc.languageengen_HK
dc.publisherIEEE.en_HK
dc.relation.ispartof2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005en_HK
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.rights©2005 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.en_HK
dc.subjectBase phrase recognitionen_HK
dc.subjectBase phrase structureen_HK
dc.subjectLexicalized hidden markov models (HMMs)en_HK
dc.subjectText chunkingen_HK
dc.titleChinese text chunking using lexicalized HMMSen_HK
dc.typeConference_Paperen_HK
dc.identifier.emailLuke, KK:kkluke@hkusua.hku.hken_HK
dc.identifier.authorityLuke, KK=rp01201en_HK
dc.description.naturepublished_or_final_versionen_HK
dc.identifier.scopuseid_2-s2.0-28444444555en_HK
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-28444444555&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.spage7en_HK
dc.identifier.epage12en_HK
dc.identifier.scopusauthoridFu, GH=7202721096en_HK
dc.identifier.scopusauthoridXu, RF=35520467000en_HK
dc.identifier.scopusauthoridLuke, KK=7003697439en_HK
dc.identifier.scopusauthoridLu, Q=35242792400en_HK

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats