File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Speaker-independent recognition of Putonghua finals
Title | Speaker-independent recognition of Putonghua finals |
---|---|
Authors | |
Issue Date | 1987 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Chan, C. [陳哲民]. (1987). Speaker-independent recognition of Putonghua finals. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | (Uncorrected OCR)
Abstract
of thesis entitled
Speaker- Independent Recognition of Putonghua Finals
submitted by
CHAN, Chit Man
for the degree of Doctor of Philosophy
at the University of Hong Kong
�
In
December 1987
ABSTRACT
A detailed study had been performed to address the problem of speaker-independent recognition of Putonghua (Mandarin) finals. The study included 35 Putonghua finals, 16 of which having trailing nasals. They were spoken by 51 speakers: 38 females, 13 males, in 5 different tones for two times. The sample was spectrally analyzed by a bank of 18 nonoverlapping critical-band filters. Three data reduction techniques:
Karhunen-Loeve Transformation (KLT) , Discrete Cosine Transformation (OCT) and Stepwise Discriminant Analysis (SDA) , were comparat i vely studied for their feature representation capability. The results indicated that KLT was superior to both OCT and SDA. Furthermore, the theoretic equivalence of OCT to KLT was found to be valid only with 5 or more feature dimensions used in computation. On the other hand, the results also showed that the Hahalanobis and a proposed modified Mahalanobis distance both gave a better measurement of performance than the other distances tested, which included the City Block, Euclidean, Minkowski, and Chebyshev.
.,.
In the second Part of the study, the Hidden Markov Modelling (HMM) technique was investigated. Three classification methods: Phonemic Labell ing (PL), Vector Quantization (VQ) and a proposed Hybrid Symbol (HS) generation, were studied for use with HMM. Whilst PL was found to be simple and efficient, its performance was not as good as VQ. However, the time taken by VQ was excessive, especially in training. The results with the HS method showed that it .could successfully merge the speed advantage of PL and the better discriminatory power of VQ. An approximately 80% saving in the quantizer training time could be achieved with only a marginal loss in performance. At the same time, it
Abs-l
Abstract
was also found that allowing skipping of states in a Left-to-Right model (LRM) could lead to a negative effect on overall recognition.
As an indication of performance, the recognition rate of the simulated system was 81.3%, 95.0% and 98.0% with the best I, 2, and 3 candidates included, respectively, using a 256-level VQ and a 6-state, no-skip LRM on a sample of 8,400 finals from 48 speakers. The specific rates on non-nasal finals achieved even 96% - 98% using the best candidate alone .
.. ,"
Abs-2 |
Degree | Doctor of Philosophy |
Subject | Mandarin dialects - Phonetics. Automatic speech recognition. |
Dept/Program | Electrical and Electronic Engineering |
Persistent Identifier | http://hdl.handle.net/10722/32797 |
HKU Library Item ID | b1236309 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Chan, Chit-man | - |
dc.contributor.author | 陳哲民 | zh_HK |
dc.date.issued | 1987 | - |
dc.identifier.citation | Chan, C. [陳哲民]. (1987). Speaker-independent recognition of Putonghua finals. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/32797 | - |
dc.description.abstract | (Uncorrected OCR) Abstract of thesis entitled Speaker- Independent Recognition of Putonghua Finals submitted by CHAN, Chit Man for the degree of Doctor of Philosophy at the University of Hong Kong � In December 1987 ABSTRACT A detailed study had been performed to address the problem of speaker-independent recognition of Putonghua (Mandarin) finals. The study included 35 Putonghua finals, 16 of which having trailing nasals. They were spoken by 51 speakers: 38 females, 13 males, in 5 different tones for two times. The sample was spectrally analyzed by a bank of 18 nonoverlapping critical-band filters. Three data reduction techniques: Karhunen-Loeve Transformation (KLT) , Discrete Cosine Transformation (OCT) and Stepwise Discriminant Analysis (SDA) , were comparat i vely studied for their feature representation capability. The results indicated that KLT was superior to both OCT and SDA. Furthermore, the theoretic equivalence of OCT to KLT was found to be valid only with 5 or more feature dimensions used in computation. On the other hand, the results also showed that the Hahalanobis and a proposed modified Mahalanobis distance both gave a better measurement of performance than the other distances tested, which included the City Block, Euclidean, Minkowski, and Chebyshev. .,. In the second Part of the study, the Hidden Markov Modelling (HMM) technique was investigated. Three classification methods: Phonemic Labell ing (PL), Vector Quantization (VQ) and a proposed Hybrid Symbol (HS) generation, were studied for use with HMM. Whilst PL was found to be simple and efficient, its performance was not as good as VQ. However, the time taken by VQ was excessive, especially in training. The results with the HS method showed that it .could successfully merge the speed advantage of PL and the better discriminatory power of VQ. An approximately 80% saving in the quantizer training time could be achieved with only a marginal loss in performance. At the same time, it Abs-l Abstract was also found that allowing skipping of states in a Left-to-Right model (LRM) could lead to a negative effect on overall recognition. As an indication of performance, the recognition rate of the simulated system was 81.3%, 95.0% and 98.0% with the best I, 2, and 3 candidates included, respectively, using a 256-level VQ and a 6-state, no-skip LRM on a sample of 8,400 finals from 48 speakers. The specific rates on non-nasal finals achieved even 96% - 98% using the best candidate alone . .. ," Abs-2 | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.source.uri | http://hub.hku.hk/bib/B12363091 | - |
dc.subject.lcsh | Mandarin dialects - Phonetics. | - |
dc.subject.lcsh | Automatic speech recognition. | - |
dc.title | Speaker-independent recognition of Putonghua finals | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b1236309 | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Electrical and Electronic Engineering | - |
dc.description.nature | abstract | - |
dc.description.nature | toc | - |
dc.identifier.mmsid | 991019751759703414 | - |