File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1007/11939993_24
- Scopus: eid_2-s2.0-77249143355
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: A unified framework for text snalysis in Chinese TTS
Title | A unified framework for text snalysis in Chinese TTS |
---|---|
Authors | |
Keywords | Chinese Tts Grapheme-To-Phoneme Conversion Lexical Analysis Text Analysis Text Normalization |
Issue Date | 2006 |
Publisher | Springer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/ |
Citation | Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2006, v. 4274 LNAI, p. 200-210 How to Cite? |
Abstract | This paper presents a robust text analysis system for Chinese text-to-speech synthesis. In this study, a lexicon word or a continuum of non-hanzi characters with the same category (e.g. a digit string) are defined as a morpheme, which is the basic unit forming a Chinese word. Based on this definition, the three key issues concerning the interpretation of real Chinese text, namely lexical disambiguation, unknown word resolution and non-standard word (NSW) normalization can be unified in a single framework and reformulated as a two-pass tagging task on a sequence of morphemes. Our system consists of four main components: (1) a pre-segmenter for sentence segmentation and morpheme segmentation; and (2) a lexicalized HMM-based chunker for identifying unknown words and guessing their part-of-speech categories; and (3) a HMM-based tagger for converting orthographic morphemes to their Chinese phonetic representation (viz. pinyin), given their word-formation patterns and part-of-speech information; (4) a post-processing for interpreting phonetic tags and fine-tuning pronunciation order for some special NSWs if necessary. The evaluation on a pinyin-notated corpus built from the Peking University corpus shows that our system can achieve correct interpretation for most words. © 2006 Springer-Verlag Berlin/Heidelberg. |
Persistent Identifier | http://hdl.handle.net/10722/159070 |
ISSN | 2023 SCImago Journal Rankings: 0.606 |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Fu, G | en_US |
dc.contributor.author | Zhang, M | en_US |
dc.contributor.author | Zhou, G | en_US |
dc.contributor.author | Luke, KK | en_US |
dc.date.accessioned | 2012-08-08T09:06:27Z | - |
dc.date.available | 2012-08-08T09:06:27Z | - |
dc.date.issued | 2006 | en_US |
dc.identifier.citation | Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2006, v. 4274 LNAI, p. 200-210 | en_US |
dc.identifier.issn | 0302-9743 | en_US |
dc.identifier.uri | http://hdl.handle.net/10722/159070 | - |
dc.description.abstract | This paper presents a robust text analysis system for Chinese text-to-speech synthesis. In this study, a lexicon word or a continuum of non-hanzi characters with the same category (e.g. a digit string) are defined as a morpheme, which is the basic unit forming a Chinese word. Based on this definition, the three key issues concerning the interpretation of real Chinese text, namely lexical disambiguation, unknown word resolution and non-standard word (NSW) normalization can be unified in a single framework and reformulated as a two-pass tagging task on a sequence of morphemes. Our system consists of four main components: (1) a pre-segmenter for sentence segmentation and morpheme segmentation; and (2) a lexicalized HMM-based chunker for identifying unknown words and guessing their part-of-speech categories; and (3) a HMM-based tagger for converting orthographic morphemes to their Chinese phonetic representation (viz. pinyin), given their word-formation patterns and part-of-speech information; (4) a post-processing for interpreting phonetic tags and fine-tuning pronunciation order for some special NSWs if necessary. The evaluation on a pinyin-notated corpus built from the Peking University corpus shows that our system can achieve correct interpretation for most words. © 2006 Springer-Verlag Berlin/Heidelberg. | en_US |
dc.language | eng | en_US |
dc.publisher | Springer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/ | en_US |
dc.relation.ispartof | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | en_US |
dc.subject | Chinese Tts | en_US |
dc.subject | Grapheme-To-Phoneme Conversion | en_US |
dc.subject | Lexical Analysis | en_US |
dc.subject | Text Analysis | en_US |
dc.subject | Text Normalization | en_US |
dc.title | A unified framework for text snalysis in Chinese TTS | en_US |
dc.type | Conference_Paper | en_US |
dc.identifier.email | Luke, KK:kkluke@hkusua.hku.hk | en_US |
dc.identifier.authority | Luke, KK=rp01201 | en_US |
dc.description.nature | link_to_subscribed_fulltext | en_US |
dc.identifier.doi | 10.1007/11939993_24 | en_US |
dc.identifier.scopus | eid_2-s2.0-77249143355 | en_US |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-77249143355&selection=ref&src=s&origin=recordpage | en_US |
dc.identifier.volume | 4274 LNAI | en_US |
dc.identifier.spage | 200 | en_US |
dc.identifier.epage | 210 | en_US |
dc.publisher.place | Germany | en_US |
dc.identifier.scopusauthorid | Fu, G=7202721096 | en_US |
dc.identifier.scopusauthorid | Zhang, M=36041252700 | en_US |
dc.identifier.scopusauthorid | Zhou, G=7403686010 | en_US |
dc.identifier.scopusauthorid | Luke, KK=7003697439 | en_US |
dc.identifier.issnl | 0302-9743 | - |