File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Chinese unknown word identification as known word tagging

TitleChinese unknown word identification as known word tagging
Authors
KeywordsChinese word segmentation
Known word tagging
Lexicalized HMMs
Unknown word identification
Issue Date2004
PublisherIEEE.
Citation
Proceedings Of 2004 International Conference On Machine Learning And Cybernetics, 2004, v. 4, p. 2612-2617 How to Cite?
AbstractThis paper presents a tagging approach to Chinese unknown word identification based on lexicalized hidden Markov models (LHMMs). In this work, Chinese unknown word identification is represented as a tagging task on a sequence of known words by introducing word-formation patterns and part-of-speech. Based on the lexicalized HMMs, a statistical tagger is further developed to assign each known word an appropriate tag that indicates its pattern in forming a word and the part-of-speech of the formed word. The experimental results on the Peking University corpus indicate that the use of lexicalization technique and the introduction of part-of-speech are helpful to unknown word identification. The experiment on the SIGHAN-PK open test data also shows that our system can achieve state-of-art performance.
Persistent Identifierhttp://hdl.handle.net/10722/47018
References

 

DC FieldValueLanguage
dc.contributor.authorFu, GHen_HK
dc.contributor.authorLuke, KKen_HK
dc.date.accessioned2007-10-30T07:04:20Z-
dc.date.available2007-10-30T07:04:20Z-
dc.date.issued2004en_HK
dc.identifier.citationProceedings Of 2004 International Conference On Machine Learning And Cybernetics, 2004, v. 4, p. 2612-2617en_HK
dc.identifier.urihttp://hdl.handle.net/10722/47018-
dc.description.abstractThis paper presents a tagging approach to Chinese unknown word identification based on lexicalized hidden Markov models (LHMMs). In this work, Chinese unknown word identification is represented as a tagging task on a sequence of known words by introducing word-formation patterns and part-of-speech. Based on the lexicalized HMMs, a statistical tagger is further developed to assign each known word an appropriate tag that indicates its pattern in forming a word and the part-of-speech of the formed word. The experimental results on the Peking University corpus indicate that the use of lexicalization technique and the introduction of part-of-speech are helpful to unknown word identification. The experiment on the SIGHAN-PK open test data also shows that our system can achieve state-of-art performance.en_HK
dc.format.extent431821 bytes-
dc.format.extent2213 bytes-
dc.format.extent2608 bytes-
dc.format.mimetypeapplication/pdf-
dc.format.mimetypetext/plain-
dc.format.mimetypetext/plain-
dc.languageengen_HK
dc.publisherIEEE.en_HK
dc.relation.ispartofProceedings of 2004 International Conference on Machine Learning and Cyberneticsen_HK
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.rights©2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.en_HK
dc.subjectChinese word segmentationen_HK
dc.subjectKnown word taggingen_HK
dc.subjectLexicalized HMMsen_HK
dc.subjectUnknown word identificationen_HK
dc.titleChinese unknown word identification as known word taggingen_HK
dc.typeConference_Paperen_HK
dc.identifier.emailLuke, KK:kkluke@hkusua.hku.hken_HK
dc.identifier.authorityLuke, KK=rp01201en_HK
dc.description.naturepublished_or_final_versionen_HK
dc.identifier.scopuseid_2-s2.0-6344285863en_HK
dc.identifier.hkuros103505-
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-6344285863&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.volume4en_HK
dc.identifier.spage2612en_HK
dc.identifier.epage2617en_HK
dc.identifier.scopusauthoridFu, GH=7202721096en_HK
dc.identifier.scopusauthoridLuke, KK=7003697439en_HK

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats