A unified framework for text snalysis in Chinese TTS

Fu, G; Zhang, M; Zhou, G; Luke, KK

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/11939993_24
Scopus: eid_2-s2.0-77249143355
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Linguistics: Conference papers

Conference Paper: A unified framework for text snalysis in Chinese TTS

Title	A unified framework for text snalysis in Chinese TTS
Authors	Fu, G Zhang, M Zhou, G Luke, KK
Keywords	Chinese Tts Grapheme-To-Phoneme Conversion Lexical Analysis Text Analysis Text Normalization
Issue Date	2006
Publisher	Springer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/
Citation	Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2006, v. 4274 LNAI, p. 200-210 How to Cite? DOI: http://dx.doi.org/10.1007/11939993_24
Abstract	This paper presents a robust text analysis system for Chinese text-to-speech synthesis. In this study, a lexicon word or a continuum of non-hanzi characters with the same category (e.g. a digit string) are defined as a morpheme, which is the basic unit forming a Chinese word. Based on this definition, the three key issues concerning the interpretation of real Chinese text, namely lexical disambiguation, unknown word resolution and non-standard word (NSW) normalization can be unified in a single framework and reformulated as a two-pass tagging task on a sequence of morphemes. Our system consists of four main components: (1) a pre-segmenter for sentence segmentation and morpheme segmentation; and (2) a lexicalized HMM-based chunker for identifying unknown words and guessing their part-of-speech categories; and (3) a HMM-based tagger for converting orthographic morphemes to their Chinese phonetic representation (viz. pinyin), given their word-formation patterns and part-of-speech information; (4) a post-processing for interpreting phonetic tags and fine-tuning pronunciation order for some special NSWs if necessary. The evaluation on a pinyin-notated corpus built from the Peking University corpus shows that our system can achieve correct interpretation for most words. © 2006 Springer-Verlag Berlin/Heidelberg.
Persistent Identifier	http://hdl.handle.net/10722/159070
ISSN	0302-9743 2023 SCImago Journal Rankings: 0.606
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Fu, G	en_US
dc.contributor.author	Zhang, M	en_US
dc.contributor.author	Zhou, G	en_US
dc.contributor.author	Luke, KK	en_US
dc.date.accessioned	2012-08-08T09:06:27Z	-
dc.date.available	2012-08-08T09:06:27Z	-
dc.date.issued	2006	en_US
dc.identifier.citation	Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2006, v. 4274 LNAI, p. 200-210	en_US
dc.identifier.issn	0302-9743	en_US
dc.identifier.uri	http://hdl.handle.net/10722/159070	-
dc.description.abstract	This paper presents a robust text analysis system for Chinese text-to-speech synthesis. In this study, a lexicon word or a continuum of non-hanzi characters with the same category (e.g. a digit string) are defined as a morpheme, which is the basic unit forming a Chinese word. Based on this definition, the three key issues concerning the interpretation of real Chinese text, namely lexical disambiguation, unknown word resolution and non-standard word (NSW) normalization can be unified in a single framework and reformulated as a two-pass tagging task on a sequence of morphemes. Our system consists of four main components: (1) a pre-segmenter for sentence segmentation and morpheme segmentation; and (2) a lexicalized HMM-based chunker for identifying unknown words and guessing their part-of-speech categories; and (3) a HMM-based tagger for converting orthographic morphemes to their Chinese phonetic representation (viz. pinyin), given their word-formation patterns and part-of-speech information; (4) a post-processing for interpreting phonetic tags and fine-tuning pronunciation order for some special NSWs if necessary. The evaluation on a pinyin-notated corpus built from the Peking University corpus shows that our system can achieve correct interpretation for most words. © 2006 Springer-Verlag Berlin/Heidelberg.	en_US
dc.language	eng	en_US
dc.publisher	Springer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/	en_US
dc.relation.ispartof	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en_US
dc.subject	Chinese Tts	en_US
dc.subject	Grapheme-To-Phoneme Conversion	en_US
dc.subject	Lexical Analysis	en_US
dc.subject	Text Analysis	en_US
dc.subject	Text Normalization	en_US
dc.title	A unified framework for text snalysis in Chinese TTS	en_US
dc.type	Conference_Paper	en_US
dc.identifier.email	Luke, KK:kkluke@hkusua.hku.hk	en_US
dc.identifier.authority	Luke, KK=rp01201	en_US
dc.description.nature	link_to_subscribed_fulltext	en_US
dc.identifier.doi	10.1007/11939993_24	en_US
dc.identifier.scopus	eid_2-s2.0-77249143355	en_US
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-77249143355&selection=ref&src=s&origin=recordpage	en_US
dc.identifier.volume	4274 LNAI	en_US
dc.identifier.spage	200	en_US
dc.identifier.epage	210	en_US
dc.publisher.place	Germany	en_US
dc.identifier.scopusauthorid	Fu, G=7202721096	en_US
dc.identifier.scopusauthorid	Zhang, M=36041252700	en_US
dc.identifier.scopusauthorid	Zhou, G=7403686010	en_US
dc.identifier.scopusauthorid	Luke, KK=7003697439	en_US
dc.identifier.issnl	0302-9743	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: A unified framework for text snalysis in Chinese TTS

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats