Supplementary

postgraduate thesis: Speaker-independent recognition of Putonghua finals

TitleSpeaker-independent recognition of Putonghua finals
Authors
Issue Date1987
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Chan, C. [陳哲民]. (1987). Speaker-independent recognition of Putonghua finals. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract(Uncorrected OCR) Abstract of thesis entitled Speaker- Independent Recognition of Putonghua Finals submitted by CHAN, Chit Man for the degree of Doctor of Philosophy at the University of Hong Kong � In December 1987 ABSTRACT A detailed study had been performed to address the problem of speaker-independent recognition of Putonghua (Mandarin) finals. The study included 35 Putonghua finals, 16 of which having trailing nasals. They were spoken by 51 speakers: 38 females, 13 males, in 5 different tones for two times. The sample was spectrally analyzed by a bank of 18 nonoverlapping critical-band filters. Three data reduction techniques: Karhunen-Loeve Transformation (KLT) , Discrete Cosine Transformation (OCT) and Stepwise Discriminant Analysis (SDA) , were comparat i vely studied for their feature representation capability. The results indicated that KLT was superior to both OCT and SDA. Furthermore, the theoretic equivalence of OCT to KLT was found to be valid only with 5 or more feature dimensions used in computation. On the other hand, the results also showed that the Hahalanobis and a proposed modified Mahalanobis distance both gave a better measurement of performance than the other distances tested, which included the City Block, Euclidean, Minkowski, and Chebyshev. .,. In the second Part of the study, the Hidden Markov Modelling (HMM) technique was investigated. Three classification methods: Phonemic Labell ing (PL), Vector Quantization (VQ) and a proposed Hybrid Symbol (HS) generation, were studied for use with HMM. Whilst PL was found to be simple and efficient, its performance was not as good as VQ. However, the time taken by VQ was excessive, especially in training. The results with the HS method showed that it .could successfully merge the speed advantage of PL and the better discriminatory power of VQ. An approximately 80% saving in the quantizer training time could be achieved with only a marginal loss in performance. At the same time, it Abs-l Abstract was also found that allowing skipping of states in a Left-to-Right model (LRM) could lead to a negative effect on overall recognition. As an indication of performance, the recognition rate of the simulated system was 81.3%, 95.0% and 98.0% with the best I, 2, and 3 candidates included, respectively, using a 256-level VQ and a 6-state, no-skip LRM on a sample of 8,400 finals from 48 speakers. The specific rates on non-nasal finals achieved even 96% - 98% using the best candidate alone . .. ," Abs-2
DegreeDoctor of Philosophy
SubjectMandarin dialects - Phonetics.
Automatic speech recognition.
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/32797
HKU Library Item IDb1236309

 

DC FieldValueLanguage
dc.contributor.authorChan, Chit-man-
dc.contributor.author陳哲民zh_HK
dc.date.issued1987-
dc.identifier.citationChan, C. [陳哲民]. (1987). Speaker-independent recognition of Putonghua finals. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/32797-
dc.description.abstract(Uncorrected OCR) Abstract of thesis entitled Speaker- Independent Recognition of Putonghua Finals submitted by CHAN, Chit Man for the degree of Doctor of Philosophy at the University of Hong Kong � In December 1987 ABSTRACT A detailed study had been performed to address the problem of speaker-independent recognition of Putonghua (Mandarin) finals. The study included 35 Putonghua finals, 16 of which having trailing nasals. They were spoken by 51 speakers: 38 females, 13 males, in 5 different tones for two times. The sample was spectrally analyzed by a bank of 18 nonoverlapping critical-band filters. Three data reduction techniques: Karhunen-Loeve Transformation (KLT) , Discrete Cosine Transformation (OCT) and Stepwise Discriminant Analysis (SDA) , were comparat i vely studied for their feature representation capability. The results indicated that KLT was superior to both OCT and SDA. Furthermore, the theoretic equivalence of OCT to KLT was found to be valid only with 5 or more feature dimensions used in computation. On the other hand, the results also showed that the Hahalanobis and a proposed modified Mahalanobis distance both gave a better measurement of performance than the other distances tested, which included the City Block, Euclidean, Minkowski, and Chebyshev. .,. In the second Part of the study, the Hidden Markov Modelling (HMM) technique was investigated. Three classification methods: Phonemic Labell ing (PL), Vector Quantization (VQ) and a proposed Hybrid Symbol (HS) generation, were studied for use with HMM. Whilst PL was found to be simple and efficient, its performance was not as good as VQ. However, the time taken by VQ was excessive, especially in training. The results with the HS method showed that it .could successfully merge the speed advantage of PL and the better discriminatory power of VQ. An approximately 80% saving in the quantizer training time could be achieved with only a marginal loss in performance. At the same time, it Abs-l Abstract was also found that allowing skipping of states in a Left-to-Right model (LRM) could lead to a negative effect on overall recognition. As an indication of performance, the recognition rate of the simulated system was 81.3%, 95.0% and 98.0% with the best I, 2, and 3 candidates included, respectively, using a 256-level VQ and a 6-state, no-skip LRM on a sample of 8,400 finals from 48 speakers. The specific rates on non-nasal finals achieved even 96% - 98% using the best candidate alone . .. ," Abs-2-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.source.urihttp://hub.hku.hk/bib/B12363091-
dc.subject.lcshMandarin dialects - Phonetics.-
dc.subject.lcshAutomatic speech recognition.-
dc.titleSpeaker-independent recognition of Putonghua finals-
dc.typePG_Thesis-
dc.identifier.hkulb1236309-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.natureabstract-
dc.description.naturetoc-
dc.identifier.mmsid991019751759703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats