File Download
Supplementary

postgraduate thesis: Exploring pathways to medical foundation models

TitleExploring pathways to medical foundation models
Authors
Issue Date2023
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhou, H. [周洪宇]. (2023). Exploring pathways to medical foundation models. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractFoundation models have revolutionized the fields of computer vision and natural language processing, but their impact on medicine remains limited. This thesis aims to explore and develop foundation models specifically tailored for medical applications, addressing the unique challenges and requirements of the healthcare domain. The motivation behind this research stems from the limited availability of training data, the need for domain-specific knowledge, and the importance of multimodal integration in medical diagnostics. The first contribution of this thesis is the introduction of unified self-supervised representation learning for training medical foundation models. Conventional self-supervised learning methods focus on preserving high-level semantics or pixel-level information in representations, which is insufficient for medical applications. We propose Preservational Contrastive Representation Learning (PCRL), which integrates diverse information, i.e., high-level semantics, pixel-level information, and scale information, into a unified framework. By doing so, medical foundation models can encode diverse information that benefits various downstream tasks, even with limited training data. Experimental results demonstrate that PCRL significantly improves transfer learning performance in radiological tasks, outperforming non-transfer counterparts by large margins. The second contribution explores knowledge-enhanced representation learning by injecting medical knowledge into foundation models. We propose REviewing FreE-text Reports for Supervision (REFERS), a framework that leverages supervision signals from associated free-text radiology reports to learn radiograph representations. REFERS surpasses label-supervised pre-training and outperforms self-supervised and transfer learning counterparts in various radiological tasks. Furthermore, we introduce Masked Record Modeling (MRM), augmenting REFERS with multimodal supervision from imaging and textual data. MRM achieves impressive results in CheXpert, achieving comparable performance to fully labeled data with only 1% of it. The third contribution focuses on developing multimodal AI models for clinical diagnostics. We present IRENE, a transformer-based model that processes multimodal clinical information for pulmonary disease identification. IRENE learns holistic representations from various clinical data, eliminating separate paths for learning modality-specific features. Experimental results show that IRENE outperforms image-only and non-unified diagnostic models by significant and large margins. To address a broader range of health issues, we propose GeMini, a generalist foundation model capable of generating diagnostic decisions for nearly one thousand problems using various clinical data modalities. GeMini significantly outperforms conventional multimodal solutions, demonstrating the importance of multimodal integration in clinical diagnostics. This thesis emphasizes the need for transferable representations from limited data, the injection of external medical knowledge, and the integration of diverse modalities. The proposed methods and models demonstrate improved performance in a range of medical tasks, paving the way for further advancements in medical AI. The findings of this research contribute to the development of foundation models specifically designed for medical applications, facilitating accurate diagnoses, personalized treatment recommendations, and improved patient outcomes.
DegreeDoctor of Philosophy
SubjectArtificial intelligence - Medical applications
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/335126

 

DC FieldValueLanguage
dc.contributor.authorZhou, Hongyu-
dc.contributor.author周洪宇-
dc.date.accessioned2023-11-13T07:44:44Z-
dc.date.available2023-11-13T07:44:44Z-
dc.date.issued2023-
dc.identifier.citationZhou, H. [周洪宇]. (2023). Exploring pathways to medical foundation models. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/335126-
dc.description.abstractFoundation models have revolutionized the fields of computer vision and natural language processing, but their impact on medicine remains limited. This thesis aims to explore and develop foundation models specifically tailored for medical applications, addressing the unique challenges and requirements of the healthcare domain. The motivation behind this research stems from the limited availability of training data, the need for domain-specific knowledge, and the importance of multimodal integration in medical diagnostics. The first contribution of this thesis is the introduction of unified self-supervised representation learning for training medical foundation models. Conventional self-supervised learning methods focus on preserving high-level semantics or pixel-level information in representations, which is insufficient for medical applications. We propose Preservational Contrastive Representation Learning (PCRL), which integrates diverse information, i.e., high-level semantics, pixel-level information, and scale information, into a unified framework. By doing so, medical foundation models can encode diverse information that benefits various downstream tasks, even with limited training data. Experimental results demonstrate that PCRL significantly improves transfer learning performance in radiological tasks, outperforming non-transfer counterparts by large margins. The second contribution explores knowledge-enhanced representation learning by injecting medical knowledge into foundation models. We propose REviewing FreE-text Reports for Supervision (REFERS), a framework that leverages supervision signals from associated free-text radiology reports to learn radiograph representations. REFERS surpasses label-supervised pre-training and outperforms self-supervised and transfer learning counterparts in various radiological tasks. Furthermore, we introduce Masked Record Modeling (MRM), augmenting REFERS with multimodal supervision from imaging and textual data. MRM achieves impressive results in CheXpert, achieving comparable performance to fully labeled data with only 1% of it. The third contribution focuses on developing multimodal AI models for clinical diagnostics. We present IRENE, a transformer-based model that processes multimodal clinical information for pulmonary disease identification. IRENE learns holistic representations from various clinical data, eliminating separate paths for learning modality-specific features. Experimental results show that IRENE outperforms image-only and non-unified diagnostic models by significant and large margins. To address a broader range of health issues, we propose GeMini, a generalist foundation model capable of generating diagnostic decisions for nearly one thousand problems using various clinical data modalities. GeMini significantly outperforms conventional multimodal solutions, demonstrating the importance of multimodal integration in clinical diagnostics. This thesis emphasizes the need for transferable representations from limited data, the injection of external medical knowledge, and the integration of diverse modalities. The proposed methods and models demonstrate improved performance in a range of medical tasks, paving the way for further advancements in medical AI. The findings of this research contribute to the development of foundation models specifically designed for medical applications, facilitating accurate diagnoses, personalized treatment recommendations, and improved patient outcomes.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshArtificial intelligence - Medical applications-
dc.titleExploring pathways to medical foundation models-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044736607703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats