An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

Qin, Ying; Wu, Yuzhong; Lee, Tan; Kong, Anthony Pak Hin

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s11265-019-01511-3
Scopus: eid_2-s2.0-85079630438
WOS: WOS:000516232500002
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Faculty of Education: Journal/Magazine Articles

Article: An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

Title	An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia
Authors	Qin, Ying Wu, Yuzhong Lee, Tan Kong, Anthony Pak Hin
Keywords	Aphasia Pathological speech assessment End-to-end Cantonese Deep neural network
Issue Date	2020
Citation	Journal of Signal Processing Systems, 2020, v. 92, n. 8, p. 819-830 How to Cite? DOI: http://dx.doi.org/10.1007/s11265-019-01511-3
Abstract	Conventional automatic assessment of pathological speech usually follows two main steps: (1) extraction of pathology-specific features; (2) classification or regression on extracted features. Given the great variety of speech and language disorders, feature design is never a straightforward task, and yet it is most crucial to the performance of assessment. This paper presents an end-to-end approach to automatic speech assessment for Cantonese-speaking People With Aphasia (PWA). The assessment is formulated as a binary classification task to discriminate PWA with high scores of subjective assessment from those with low scores. The 2-layer Gated Recurrent Unit (GRU) and Convolutional Neural Network (CNN) models are applied to realize the end-to-end mapping from basic speech features to the classification outcome. The pathology-specific features used for assessment are learned implicitly by the neural network model. The Class Activation Mapping (CAM) method is utilized to visualize how the learned features contribute to the assessment result. Experimental results show that the end-to-end approach can achieve comparable performance to the conventional two-step approach in the classification task, and the CNN model is able to learn impairment-related features that are similar to the hand-crafted features. The experimental results also indicate that CNN model performs better than 2-layer GRU model in this specific task.
Persistent Identifier	http://hdl.handle.net/10722/307059
ISSN	1939-8018 2023 Impact Factor: 1.6 2023 SCImago Journal Rankings: 0.479
ISI Accession Number ID	WOS:000516232500002

DC Field	Value	Language
dc.contributor.author	Qin, Ying	-
dc.contributor.author	Wu, Yuzhong	-
dc.contributor.author	Lee, Tan	-
dc.contributor.author	Kong, Anthony Pak Hin	-
dc.date.accessioned	2021-11-03T06:21:51Z	-
dc.date.available	2021-11-03T06:21:51Z	-
dc.date.issued	2020	-
dc.identifier.citation	Journal of Signal Processing Systems, 2020, v. 92, n. 8, p. 819-830	-
dc.identifier.issn	1939-8018	-
dc.identifier.uri	http://hdl.handle.net/10722/307059	-
dc.description.abstract	Conventional automatic assessment of pathological speech usually follows two main steps: (1) extraction of pathology-specific features; (2) classification or regression on extracted features. Given the great variety of speech and language disorders, feature design is never a straightforward task, and yet it is most crucial to the performance of assessment. This paper presents an end-to-end approach to automatic speech assessment for Cantonese-speaking People With Aphasia (PWA). The assessment is formulated as a binary classification task to discriminate PWA with high scores of subjective assessment from those with low scores. The 2-layer Gated Recurrent Unit (GRU) and Convolutional Neural Network (CNN) models are applied to realize the end-to-end mapping from basic speech features to the classification outcome. The pathology-specific features used for assessment are learned implicitly by the neural network model. The Class Activation Mapping (CAM) method is utilized to visualize how the learned features contribute to the assessment result. Experimental results show that the end-to-end approach can achieve comparable performance to the conventional two-step approach in the classification task, and the CNN model is able to learn impairment-related features that are similar to the hand-crafted features. The experimental results also indicate that CNN model performs better than 2-layer GRU model in this specific task.	-
dc.language	eng	-
dc.relation.ispartof	Journal of Signal Processing Systems	-
dc.subject	Aphasia	-
dc.subject	Pathological speech assessment	-
dc.subject	End-to-end	-
dc.subject	Cantonese	-
dc.subject	Deep neural network	-
dc.title	An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1007/s11265-019-01511-3	-
dc.identifier.scopus	eid_2-s2.0-85079630438	-
dc.identifier.volume	92	-
dc.identifier.issue	8	-
dc.identifier.spage	819	-
dc.identifier.epage	830	-
dc.identifier.eissn	1939-8115	-
dc.identifier.isi	WOS:000516232500002	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats