Use of voicing features in HMM-based speech recognition

Thomson, DL; Chengalvarayan, R

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1016/S0167-6393(01)00011-5
Scopus: eid_2-s2.0-0036642777
WOS: WOS:000176314600003
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Biological Sciences: Journal/Magazine Articles

Article: Use of voicing features in HMM-based speech recognition

Title	Use of voicing features in HMM-based speech recognition
Authors	Thomson, DL Chengalvarayan, R
Keywords	Autocorrelation Function Cepstral Mean Subtraction Discriminative Training Hidden Markov Models Hierarchical Signal Bias Removal Jitter Periodicity Speech Recognition Features Voiced And Unvoiced Speech Voicing
Issue Date	2002
Publisher	Elsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/specom
Citation	Speech Communication, 2002, v. 37 n. 3-4, p. 197-211 How to Cite? DOI: http://dx.doi.org/10.1016/S0167-6393(01)00011-5
Abstract	We investigate speech recognition features related to voicing functions that indicate whether the vocal folds are vibrating. We describe two voicing features, periodicity and jitter, and demonstrate that they are powerful voicing discriminators. The periodicity and jitter features and their first and second time derivatives are appended to a standard 38-dimensional feature vector comprising the first and second time derivatives of the frame energy and the cepstral coefficients with their first and second time derivatives. HMM-based connected-digit (CD) and large-vocabulary (LV) recognition experiments comparing the traditional and extended feature sets show that voicing features and spectral information are complementary and that improved speech recognition performance is obtained by combining the two sources of information. We further conclude that the difference in performance with and without voicing becomes more significant when minimum string error (MSE) training is used than when maximum likelihood (ML) training is used. © 2002 Elsevier Science B.V. All rights reserved.
Persistent Identifier	http://hdl.handle.net/10722/178772
ISSN	0167-6393 2023 Impact Factor: 2.4 2023 SCImago Journal Rankings: 0.769
ISI Accession Number ID	WOS:000176314600003
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Thomson, DL	en_US
dc.contributor.author	Chengalvarayan, R	en_US
dc.date.accessioned	2012-12-19T09:49:39Z	-
dc.date.available	2012-12-19T09:49:39Z	-
dc.date.issued	2002	en_US
dc.identifier.citation	Speech Communication, 2002, v. 37 n. 3-4, p. 197-211	en_US
dc.identifier.issn	0167-6393	en_US
dc.identifier.uri	http://hdl.handle.net/10722/178772	-
dc.description.abstract	We investigate speech recognition features related to voicing functions that indicate whether the vocal folds are vibrating. We describe two voicing features, periodicity and jitter, and demonstrate that they are powerful voicing discriminators. The periodicity and jitter features and their first and second time derivatives are appended to a standard 38-dimensional feature vector comprising the first and second time derivatives of the frame energy and the cepstral coefficients with their first and second time derivatives. HMM-based connected-digit (CD) and large-vocabulary (LV) recognition experiments comparing the traditional and extended feature sets show that voicing features and spectral information are complementary and that improved speech recognition performance is obtained by combining the two sources of information. We further conclude that the difference in performance with and without voicing becomes more significant when minimum string error (MSE) training is used than when maximum likelihood (ML) training is used. © 2002 Elsevier Science B.V. All rights reserved.	en_US
dc.language	eng	en_US
dc.publisher	Elsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/specom	en_US
dc.relation.ispartof	Speech Communication	en_US
dc.subject	Autocorrelation Function	en_US
dc.subject	Cepstral Mean Subtraction	en_US
dc.subject	Discriminative Training	en_US
dc.subject	Hidden Markov Models	en_US
dc.subject	Hierarchical Signal Bias Removal	en_US
dc.subject	Jitter	en_US
dc.subject	Periodicity	en_US
dc.subject	Speech Recognition Features	en_US
dc.subject	Voiced And Unvoiced Speech	en_US
dc.subject	Voicing	en_US
dc.title	Use of voicing features in HMM-based speech recognition	en_US
dc.type	Article	en_US
dc.identifier.email	Thomson, DL: dthomson@hku.hk	en_US
dc.identifier.authority	Thomson, DL=rp00788	en_US
dc.description.nature	link_to_subscribed_fulltext	en_US
dc.identifier.doi	10.1016/S0167-6393(01)00011-5	en_US
dc.identifier.scopus	eid_2-s2.0-0036642777	en_US
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-0036642777&selection=ref&src=s&origin=recordpage	en_US
dc.identifier.volume	37	en_US
dc.identifier.issue	3-4	en_US
dc.identifier.spage	197	en_US
dc.identifier.epage	211	en_US
dc.identifier.isi	WOS:000176314600003	-
dc.publisher.place	Netherlands	en_US
dc.identifier.scopusauthorid	Thomson, DL=7202586830	en_US
dc.identifier.scopusauthorid	Chengalvarayan, R=6701843465	en_US
dc.identifier.issnl	0167-6393	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Use of voicing features in HMM-based speech recognition

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats