Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale

Xie, S; Yan, N; Yu, P; Ng, ML; Wang, L; Ji, Z

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.21437/Interspeech.2016-986
Scopus: eid_2-s2.0-84994335417
WOS: WOS:000409394401236
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Division of Speech & Hearing Sciences: Conference papers

Conference Paper: Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale

Title	Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale
Authors	Xie, S Yan, N Yu, P Ng, ML Wang, L Ji, Z
Keywords	Automatic assessment DBN GRBAS MLP Voice quality
Issue Date	2016
Publisher	International Speech Communication Association (ISCA).
Citation	Proceedings of the 17th INTERSPEECH conference 2016 , San Francisco, USA, 8-12 September 2016, p. 2656-2660 How to Cite? DOI: http://dx.doi.org/10.21437/Interspeech.2016-986
Abstract	In the field of voice therapy, perceptual evaluation is widely used by expert listeners as a way to evaluate pathological and normal voice quality. This approach is understandably subjective as it is subject to listeners’ bias which high inter- and intra-listeners variability can be found. As such, research on automatic assessment of pathological voices using a combination of subjective and objective analyses emerged. The present study aimed to develop a complementary automatic assessment system for voice quality based on the well-known GRBAS scale by using a battery of multidimensional acoustical measures through Deep Neural Networks. A total of 44 dimensionality parameters including Mel-frequency Cepstral Coefficients, Smoothed Cepstral Peak Prominence and Long-Term Average Spectrum was adopted. In addition, the state-of-the-art automatic assessment system based on Modulation Spectrum (MS) features and GMM classifiers was used as comparison system. The classification results using the proposed method revealed a moderate correlation with subjective GRBAS scores of dysphonic severity, and yielded a better performance than MS-GMM system, with the best accuracy around 81.53%. The findings indicate that such assessment system can be used as an appropriate evaluation tool in determining the presence and severity of voice disorders.
Description	Poster Presentation - Session: Learning, Education and Different Speech - no. Sun-P-7-3-3, paper ID 986
Persistent Identifier	http://hdl.handle.net/10722/260889
ISSN	1990-9772 2020 SCImago Journal Rankings: 0.689
ISI Accession Number ID	WOS:000409394401236

DC Field	Value	Language
dc.contributor.author	Xie, S	-
dc.contributor.author	Yan, N	-
dc.contributor.author	Yu, P	-
dc.contributor.author	Ng, ML	-
dc.contributor.author	Wang, L	-
dc.contributor.author	Ji, Z	-
dc.date.accessioned	2018-09-14T08:49:04Z	-
dc.date.available	2018-09-14T08:49:04Z	-
dc.date.issued	2016	-
dc.identifier.citation	Proceedings of the 17th INTERSPEECH conference 2016 , San Francisco, USA, 8-12 September 2016, p. 2656-2660	-
dc.identifier.issn	1990-9772	-
dc.identifier.uri	http://hdl.handle.net/10722/260889	-
dc.description	Poster Presentation - Session: Learning, Education and Different Speech - no. Sun-P-7-3-3, paper ID 986	-
dc.description.abstract	In the field of voice therapy, perceptual evaluation is widely used by expert listeners as a way to evaluate pathological and normal voice quality. This approach is understandably subjective as it is subject to listeners’ bias which high inter- and intra-listeners variability can be found. As such, research on automatic assessment of pathological voices using a combination of subjective and objective analyses emerged. The present study aimed to develop a complementary automatic assessment system for voice quality based on the well-known GRBAS scale by using a battery of multidimensional acoustical measures through Deep Neural Networks. A total of 44 dimensionality parameters including Mel-frequency Cepstral Coefficients, Smoothed Cepstral Peak Prominence and Long-Term Average Spectrum was adopted. In addition, the state-of-the-art automatic assessment system based on Modulation Spectrum (MS) features and GMM classifiers was used as comparison system. The classification results using the proposed method revealed a moderate correlation with subjective GRBAS scores of dysphonic severity, and yielded a better performance than MS-GMM system, with the best accuracy around 81.53%. The findings indicate that such assessment system can be used as an appropriate evaluation tool in determining the presence and severity of voice disorders.	-
dc.language	eng	-
dc.publisher	International Speech Communication Association (ISCA).	-
dc.relation.ispartof	Interspeech Conference Proceedings	-
dc.subject	Automatic assessment	-
dc.subject	DBN	-
dc.subject	GRBAS	-
dc.subject	MLP	-
dc.subject	Voice quality	-
dc.title	Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale	-
dc.type	Conference_Paper	-
dc.identifier.email	Ng, ML: manwa@hku.hk	-
dc.identifier.authority	Ng, ML=rp00942	-
dc.identifier.doi	10.21437/Interspeech.2016-986	-
dc.identifier.scopus	eid_2-s2.0-84994335417	-
dc.identifier.hkuros	290499	-
dc.identifier.spage	2656	-
dc.identifier.epage	2660	-
dc.identifier.isi	WOS:000409394401236	-
dc.publisher.place	United States	-
dc.identifier.issnl	1990-9772	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats