File Download
Supplementary

postgraduate thesis: A segmented auditory attention decoding model and its application to neurofeedback based target speech perception

TitleA segmented auditory attention decoding model and its application to neurofeedback based target speech perception
Authors
Issue Date2021
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Wang, L. [王蕾]. (2021). A segmented auditory attention decoding model and its application to neurofeedback based target speech perception. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractHuman listeners can perceive the target speech stream effortlessly in the complex auditory scenarios. Neuroimaging technologies such as electroencephalography (EEG) have been widely used to understand the neural mechanism of target speech perception and to decode the auditory attention modulation patterns in the complex auditory scenes. Previous behavioral and neurological studies demonstrated that target speech perception depends on the regularly hierarchical structures, nevertheless, little is known about the interactions between auditory attention modulation and different speech segments to the target speech perception. This doctoral study mainly aimed to reveal the underlying mechanism of auditory attention modulation for different root-mean-square (RMS)-level-based speech segments, and to develop advanced auditory attention decoding (AAD) methods to further help target speech perception in the complex auditory scenarios. Firstly, the contribution of different RMS-level-based segments to speech perception was examined through related behavioral and neurological tests. Behavioral results showed that different RMS-level-based segments carrying distinct information played different roles in speech intelligibility. Besides, neurological experiments demonstrated that each type of RMS-level-based speech segment elicited a specific cortical response pattern with the auditory attention modulation, indicating that the target speech perception was jointly affected by different types of RMS-level-based segments and auditory attention modulation. These findings provided new perspectives to understand the speech perception mechanisms of the auditory attention modulation in the complex auditory scenes. Following that, an effective speech-RMS-level-based segmented AAD model was proposed to promote the AAD performance in a wide range of signal-to-masker ratios (SMRs). The proposed segmented AAD model consisted of three steps. First, a support vector machine classifier was used to predict the perceived auditory stimuli belonging to higher- or lower-RMS-level-based speech segments through the corresponding EEG signals. Subsequently, the speech envelope was reconstructed using the specific AAD model in each type of speech segment. Lastly, the target speech was determined by comparing the correlation coefficients between the original and reconstructed speech envelopes. Compared to the traditional unified AAD model, which did not separate the functional roles of higher- or lower-RMS-level-based speech segments in AAD, the proposed segmented computational method significantly improved the AAD accuracy even under low SMR levels and with the short decoding window lengths. Lastly, the proposal segmented AAD model was further combined with advanced speech processing algorithms to develop an intention-adaptive speech signal processing system in the competing-speaker environments. In order to apply such a neurofeedback-based speech signal processing system in the real-life scenes, subjects were required to focus or switch their attention between the competing speakers according to the experimental requirements. Results showed that the cortical tracking ability to the target speech streams could be a reliable biomarker to reflect dynamics of auditory attention states. The neurofeedback-based intention-adaptive system could facilitate the target speech perception under the different SMRs when the auditory attention was dynamically switched from one to the other speaker stream. These findings indicated that the neurofeedback-based speech separation system has the potential to improve target speech perception in the complex auditory scenes.
DegreeDoctor of Philosophy
SubjectAuditory perception - Computer simulation
Electroencephalography
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/313655

 

DC FieldValueLanguage
dc.contributor.authorWang, Lei-
dc.contributor.author王蕾-
dc.date.accessioned2022-06-26T09:32:24Z-
dc.date.available2022-06-26T09:32:24Z-
dc.date.issued2021-
dc.identifier.citationWang, L. [王蕾]. (2021). A segmented auditory attention decoding model and its application to neurofeedback based target speech perception. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/313655-
dc.description.abstractHuman listeners can perceive the target speech stream effortlessly in the complex auditory scenarios. Neuroimaging technologies such as electroencephalography (EEG) have been widely used to understand the neural mechanism of target speech perception and to decode the auditory attention modulation patterns in the complex auditory scenes. Previous behavioral and neurological studies demonstrated that target speech perception depends on the regularly hierarchical structures, nevertheless, little is known about the interactions between auditory attention modulation and different speech segments to the target speech perception. This doctoral study mainly aimed to reveal the underlying mechanism of auditory attention modulation for different root-mean-square (RMS)-level-based speech segments, and to develop advanced auditory attention decoding (AAD) methods to further help target speech perception in the complex auditory scenarios. Firstly, the contribution of different RMS-level-based segments to speech perception was examined through related behavioral and neurological tests. Behavioral results showed that different RMS-level-based segments carrying distinct information played different roles in speech intelligibility. Besides, neurological experiments demonstrated that each type of RMS-level-based speech segment elicited a specific cortical response pattern with the auditory attention modulation, indicating that the target speech perception was jointly affected by different types of RMS-level-based segments and auditory attention modulation. These findings provided new perspectives to understand the speech perception mechanisms of the auditory attention modulation in the complex auditory scenes. Following that, an effective speech-RMS-level-based segmented AAD model was proposed to promote the AAD performance in a wide range of signal-to-masker ratios (SMRs). The proposed segmented AAD model consisted of three steps. First, a support vector machine classifier was used to predict the perceived auditory stimuli belonging to higher- or lower-RMS-level-based speech segments through the corresponding EEG signals. Subsequently, the speech envelope was reconstructed using the specific AAD model in each type of speech segment. Lastly, the target speech was determined by comparing the correlation coefficients between the original and reconstructed speech envelopes. Compared to the traditional unified AAD model, which did not separate the functional roles of higher- or lower-RMS-level-based speech segments in AAD, the proposed segmented computational method significantly improved the AAD accuracy even under low SMR levels and with the short decoding window lengths. Lastly, the proposal segmented AAD model was further combined with advanced speech processing algorithms to develop an intention-adaptive speech signal processing system in the competing-speaker environments. In order to apply such a neurofeedback-based speech signal processing system in the real-life scenes, subjects were required to focus or switch their attention between the competing speakers according to the experimental requirements. Results showed that the cortical tracking ability to the target speech streams could be a reliable biomarker to reflect dynamics of auditory attention states. The neurofeedback-based intention-adaptive system could facilitate the target speech perception under the different SMRs when the auditory attention was dynamically switched from one to the other speaker stream. These findings indicated that the neurofeedback-based speech separation system has the potential to improve target speech perception in the complex auditory scenes.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshAuditory perception - Computer simulation-
dc.subject.lcshElectroencephalography-
dc.titleA segmented auditory attention decoding model and its application to neurofeedback based target speech perception-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2022-
dc.date.hkucongregation2022-
dc.identifier.mmsid991044545287503414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats