File Download
Supplementary

postgraduate thesis: Development and validation of explainable machine-learning prediction systems : a study of biomedical and clinical data

TitleDevelopment and validation of explainable machine-learning prediction systems : a study of biomedical and clinical data
Authors
Advisors
Advisor(s):Kwok, KW
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Ng, Y. L. [吳鋭麟]. (2024). Development and validation of explainable machine-learning prediction systems : a study of biomedical and clinical data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractRecent years have witnessed the rapid development and growing popularity of machine learning (ML) algorithms and explainability methods. ML is renowned for its exceptional ability to capture key features and patterns in high-dimensional data, therefore it is suitable for applications in biomedical and clinical data. These advantages have led to the emergence of explainable ML methods for predicting disease risk, estimating patient readmission likelihood, and forecasting care needs. However, while recent studies have primarily focused on achieving high predictive accuracy, it is essential to develop ML frameworks that not only achieve high predictive accuracy but also provide interpretability and transparency. The main focus of this thesis is to propose a generic workflow that integrates procedures essential for the development of explainable ML systems, including (i) the categorization of data types, (ii) the selection of appropriate ML algorithms, (iii) the choice of evaluation metrics, and (iv) the utilization of explainability methods. Categorizing data types allows a comprehensive understanding of the unique characteristics inherent in a dataset. This facilitates the assessment of the suitability of ML algorithms and the identification of any necessary data preprocessing steps. The selection of ML algorithms can have varying impacts on performance. Evaluation metrics play a crucial role in providing quantitative measures to assess the performance of different algorithms or settings. Explainability methods enable the generation of interpretable explanations, which allow us to understand the factors and features that contribute to the models’ predictions or decisions. The first part of this thesis studied the workflow involved in developing an explainable ML framework for structured electronic health record data. Patients with Clostridioides difficile infection who are at risk of mortality or recurrence were utilized to develop an open-access web-based prediction system aimed at estimating their outcome. Prognostic models, including four various types of ML algorithms and statistical logistics regression models, were developed, and compared to determine the optimal ML algorithms for this type of data. Explainability methods were employed to identify which features are crucial to the ML models and associate them with clinical findings. The second part of this thesis focused on the development of ML platforms for predicting enzyme function based on protein structures. Protein structure data are mainly unstructured or semi-structured data. This can be modeled using graph representation while graph neural networks can be leveraged to extract relevant features. To pinpoint catalytic amino acid residues related to the enzyme function, several explainability methods were investigated to assess their effectiveness. The proposed framework can be readily integrated with AlphaFold 2-predicted structures, as an end-to-end framework for deriving enzymatic functions and active sites from input protein sequences. The last part of this thesis highlights future research directions and potential enhancements for the proposed techniques. To summarize, this thesis studied the procedures crucial for the development of explainable ML systems based on biomedical data.
DegreeDoctor of Philosophy
SubjectMedical informatics
Machine learning
Dept/ProgramMechanical Engineering
Persistent Identifierhttp://hdl.handle.net/10722/358266

 

DC FieldValueLanguage
dc.contributor.advisorKwok, KW-
dc.contributor.authorNg, Yui Lun-
dc.contributor.author吳鋭麟-
dc.date.accessioned2025-07-28T08:40:43Z-
dc.date.available2025-07-28T08:40:43Z-
dc.date.issued2024-
dc.identifier.citationNg, Y. L. [吳鋭麟]. (2024). Development and validation of explainable machine-learning prediction systems : a study of biomedical and clinical data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/358266-
dc.description.abstractRecent years have witnessed the rapid development and growing popularity of machine learning (ML) algorithms and explainability methods. ML is renowned for its exceptional ability to capture key features and patterns in high-dimensional data, therefore it is suitable for applications in biomedical and clinical data. These advantages have led to the emergence of explainable ML methods for predicting disease risk, estimating patient readmission likelihood, and forecasting care needs. However, while recent studies have primarily focused on achieving high predictive accuracy, it is essential to develop ML frameworks that not only achieve high predictive accuracy but also provide interpretability and transparency. The main focus of this thesis is to propose a generic workflow that integrates procedures essential for the development of explainable ML systems, including (i) the categorization of data types, (ii) the selection of appropriate ML algorithms, (iii) the choice of evaluation metrics, and (iv) the utilization of explainability methods. Categorizing data types allows a comprehensive understanding of the unique characteristics inherent in a dataset. This facilitates the assessment of the suitability of ML algorithms and the identification of any necessary data preprocessing steps. The selection of ML algorithms can have varying impacts on performance. Evaluation metrics play a crucial role in providing quantitative measures to assess the performance of different algorithms or settings. Explainability methods enable the generation of interpretable explanations, which allow us to understand the factors and features that contribute to the models’ predictions or decisions. The first part of this thesis studied the workflow involved in developing an explainable ML framework for structured electronic health record data. Patients with Clostridioides difficile infection who are at risk of mortality or recurrence were utilized to develop an open-access web-based prediction system aimed at estimating their outcome. Prognostic models, including four various types of ML algorithms and statistical logistics regression models, were developed, and compared to determine the optimal ML algorithms for this type of data. Explainability methods were employed to identify which features are crucial to the ML models and associate them with clinical findings. The second part of this thesis focused on the development of ML platforms for predicting enzyme function based on protein structures. Protein structure data are mainly unstructured or semi-structured data. This can be modeled using graph representation while graph neural networks can be leveraged to extract relevant features. To pinpoint catalytic amino acid residues related to the enzyme function, several explainability methods were investigated to assess their effectiveness. The proposed framework can be readily integrated with AlphaFold 2-predicted structures, as an end-to-end framework for deriving enzymatic functions and active sites from input protein sequences. The last part of this thesis highlights future research directions and potential enhancements for the proposed techniques. To summarize, this thesis studied the procedures crucial for the development of explainable ML systems based on biomedical data. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshMedical informatics-
dc.subject.lcshMachine learning-
dc.titleDevelopment and validation of explainable machine-learning prediction systems : a study of biomedical and clinical data-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineMechanical Engineering-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044843668303414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats