File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: A unified framework for text snalysis in Chinese TTS

TitleA unified framework for text snalysis in Chinese TTS
Authors
KeywordsChinese Tts
Grapheme-To-Phoneme Conversion
Lexical Analysis
Text Analysis
Text Normalization
Issue Date2006
PublisherSpringer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/
Citation
Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2006, v. 4274 LNAI, p. 200-210 How to Cite?
AbstractThis paper presents a robust text analysis system for Chinese text-to-speech synthesis. In this study, a lexicon word or a continuum of non-hanzi characters with the same category (e.g. a digit string) are defined as a morpheme, which is the basic unit forming a Chinese word. Based on this definition, the three key issues concerning the interpretation of real Chinese text, namely lexical disambiguation, unknown word resolution and non-standard word (NSW) normalization can be unified in a single framework and reformulated as a two-pass tagging task on a sequence of morphemes. Our system consists of four main components: (1) a pre-segmenter for sentence segmentation and morpheme segmentation; and (2) a lexicalized HMM-based chunker for identifying unknown words and guessing their part-of-speech categories; and (3) a HMM-based tagger for converting orthographic morphemes to their Chinese phonetic representation (viz. pinyin), given their word-formation patterns and part-of-speech information; (4) a post-processing for interpreting phonetic tags and fine-tuning pronunciation order for some special NSWs if necessary. The evaluation on a pinyin-notated corpus built from the Peking University corpus shows that our system can achieve correct interpretation for most words. © 2006 Springer-Verlag Berlin/Heidelberg.
Persistent Identifierhttp://hdl.handle.net/10722/159070
ISSN
2023 SCImago Journal Rankings: 0.606
References

 

DC FieldValueLanguage
dc.contributor.authorFu, Gen_US
dc.contributor.authorZhang, Men_US
dc.contributor.authorZhou, Gen_US
dc.contributor.authorLuke, KKen_US
dc.date.accessioned2012-08-08T09:06:27Z-
dc.date.available2012-08-08T09:06:27Z-
dc.date.issued2006en_US
dc.identifier.citationLecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2006, v. 4274 LNAI, p. 200-210en_US
dc.identifier.issn0302-9743en_US
dc.identifier.urihttp://hdl.handle.net/10722/159070-
dc.description.abstractThis paper presents a robust text analysis system for Chinese text-to-speech synthesis. In this study, a lexicon word or a continuum of non-hanzi characters with the same category (e.g. a digit string) are defined as a morpheme, which is the basic unit forming a Chinese word. Based on this definition, the three key issues concerning the interpretation of real Chinese text, namely lexical disambiguation, unknown word resolution and non-standard word (NSW) normalization can be unified in a single framework and reformulated as a two-pass tagging task on a sequence of morphemes. Our system consists of four main components: (1) a pre-segmenter for sentence segmentation and morpheme segmentation; and (2) a lexicalized HMM-based chunker for identifying unknown words and guessing their part-of-speech categories; and (3) a HMM-based tagger for converting orthographic morphemes to their Chinese phonetic representation (viz. pinyin), given their word-formation patterns and part-of-speech information; (4) a post-processing for interpreting phonetic tags and fine-tuning pronunciation order for some special NSWs if necessary. The evaluation on a pinyin-notated corpus built from the Peking University corpus shows that our system can achieve correct interpretation for most words. © 2006 Springer-Verlag Berlin/Heidelberg.en_US
dc.languageengen_US
dc.publisherSpringer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/en_US
dc.relation.ispartofLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)en_US
dc.subjectChinese Ttsen_US
dc.subjectGrapheme-To-Phoneme Conversionen_US
dc.subjectLexical Analysisen_US
dc.subjectText Analysisen_US
dc.subjectText Normalizationen_US
dc.titleA unified framework for text snalysis in Chinese TTSen_US
dc.typeConference_Paperen_US
dc.identifier.emailLuke, KK:kkluke@hkusua.hku.hken_US
dc.identifier.authorityLuke, KK=rp01201en_US
dc.description.naturelink_to_subscribed_fulltexten_US
dc.identifier.doi10.1007/11939993_24en_US
dc.identifier.scopuseid_2-s2.0-77249143355en_US
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-77249143355&selection=ref&src=s&origin=recordpageen_US
dc.identifier.volume4274 LNAIen_US
dc.identifier.spage200en_US
dc.identifier.epage210en_US
dc.publisher.placeGermanyen_US
dc.identifier.scopusauthoridFu, G=7202721096en_US
dc.identifier.scopusauthoridZhang, M=36041252700en_US
dc.identifier.scopusauthoridZhou, G=7403686010en_US
dc.identifier.scopusauthoridLuke, KK=7003697439en_US
dc.identifier.issnl0302-9743-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats