File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Analysis and recognition of isolated putonghua vowels by Karhunen-Loève transformation techniques

TitleAnalysis and recognition of isolated putonghua vowels by Karhunen-Loève transformation techniques
Authors
KeywordsChinese Phonetic Alphabets (CPA)
closed test
critical-band filtering
Euclidean distance
final
initial
Karhunen-Loève transformation (KLT)
Mahalanobis distance
modified Mahalanobis distance
open test
Putonghua
semi-open test
sequential open test
tone
Issue Date1986
PublisherElsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/specom
Citation
Speech Communication, 1986, v. 5 n. 3-4, p. 299-330 How to Cite?
AbstractA systematic study on a speaker-independent vowel recognition model has been performed. Karhunen-Loève Transformation (KLT), or Principal Component Analysis, technique was applied subsequent to a spectral analysis of the speech signal by 18 non-overlapping critical-band filters. Four experiments have been conducted using selected segments of 8 isolated Putonghua (Mandarin) vowels, spoken twice in 5 tones by 38 females and 13 males. The first experiment uses the same speech sample in training and testing to evaluate the effects of KLT, speaker normalization, distance metric and number of vowel classes. A modified Mahalanobis distance coupled with a 7-class condition was found to give the best performance. In the next experiment, one sample was used to train the model, and another trial of the same speech, spoken by the same group of speakers, was used to test it. It was found that, in general, a sex-specific and tone-specific procedure could be avoided without significant loss in performance. The third experiment repeatedly trained the model with 50 speakers and tested it wiht the remaining one until all 51 speakers had been tested. Under this stringent condition, an average recognition rate of 88.2% was achieved using only 4 classificatory dimensions. In the last experiment, all segments of a vowel were labelled using the most stringent conditions. The model was confirmed to perform well for one male and one female speaker selected at random. Also, the vowel that had caused the greatest confusion was found to be well recognized when treated as an allophone of another vowel. Finally, the possibility of extending the present technique to diphthong recognition is discussed together with some preliminary results. © 1986.
Persistent Identifierhttp://hdl.handle.net/10722/147778
ISSN
2015 Impact Factor: 1.038
2015 SCImago Journal Rankings: 0.685

 

DC FieldValueLanguage
dc.contributor.authorChan, LCMen_HK
dc.contributor.authorCheung, YSen_HK
dc.date.accessioned2012-05-29T06:09:10Z-
dc.date.available2012-05-29T06:09:10Z-
dc.date.issued1986en_HK
dc.identifier.citationSpeech Communication, 1986, v. 5 n. 3-4, p. 299-330en_HK
dc.identifier.issn0167-6393en_HK
dc.identifier.urihttp://hdl.handle.net/10722/147778-
dc.description.abstractA systematic study on a speaker-independent vowel recognition model has been performed. Karhunen-Loève Transformation (KLT), or Principal Component Analysis, technique was applied subsequent to a spectral analysis of the speech signal by 18 non-overlapping critical-band filters. Four experiments have been conducted using selected segments of 8 isolated Putonghua (Mandarin) vowels, spoken twice in 5 tones by 38 females and 13 males. The first experiment uses the same speech sample in training and testing to evaluate the effects of KLT, speaker normalization, distance metric and number of vowel classes. A modified Mahalanobis distance coupled with a 7-class condition was found to give the best performance. In the next experiment, one sample was used to train the model, and another trial of the same speech, spoken by the same group of speakers, was used to test it. It was found that, in general, a sex-specific and tone-specific procedure could be avoided without significant loss in performance. The third experiment repeatedly trained the model with 50 speakers and tested it wiht the remaining one until all 51 speakers had been tested. Under this stringent condition, an average recognition rate of 88.2% was achieved using only 4 classificatory dimensions. In the last experiment, all segments of a vowel were labelled using the most stringent conditions. The model was confirmed to perform well for one male and one female speaker selected at random. Also, the vowel that had caused the greatest confusion was found to be well recognized when treated as an allophone of another vowel. Finally, the possibility of extending the present technique to diphthong recognition is discussed together with some preliminary results. © 1986.en_HK
dc.languageengen_US
dc.publisherElsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/specomen_HK
dc.relation.ispartofSpeech Communicationen_HK
dc.subjectChinese Phonetic Alphabets (CPA)en_HK
dc.subjectclosed testen_HK
dc.subjectcritical-band filteringen_HK
dc.subjectEuclidean distanceen_HK
dc.subjectfinalen_HK
dc.subjectinitialen_HK
dc.subjectKarhunen-Loève transformation (KLT)en_HK
dc.subjectMahalanobis distanceen_HK
dc.subjectmodified Mahalanobis distanceen_HK
dc.subjectopen testen_HK
dc.subjectPutonghuaen_HK
dc.subjectsemi-open testen_HK
dc.subjectsequential open testen_HK
dc.subjecttoneen_HK
dc.titleAnalysis and recognition of isolated putonghua vowels by Karhunen-Loève transformation techniquesen_HK
dc.typeArticleen_HK
dc.identifier.emailCheung, YS:paul.cheung@hku.hken_HK
dc.identifier.authorityCheung, YS=rp00077en_HK
dc.description.naturelink_to_subscribed_fulltexten_US
dc.identifier.scopuseid_2-s2.0-0022888608en_HK
dc.identifier.volume5en_HK
dc.identifier.issue3-4en_HK
dc.identifier.spage299en_HK
dc.identifier.epage330en_HK
dc.publisher.placeNetherlandsen_HK
dc.identifier.scopusauthoridChan, LCM=55223130400en_HK
dc.identifier.scopusauthoridCheung, YS=7202595335en_HK

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats