File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Exploring pathways to medical foundation models
Title | Exploring pathways to medical foundation models |
---|---|
Authors | |
Issue Date | 2023 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Zhou, H. [周洪宇]. (2023). Exploring pathways to medical foundation models. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Foundation models have revolutionized the fields of computer vision and natural language processing, but their impact on medicine remains limited. This thesis aims to explore and develop foundation models specifically tailored for medical applications, addressing the unique challenges and requirements of the healthcare domain. The motivation behind this research stems from the limited availability of training data, the need for domain-specific knowledge, and the importance of multimodal integration in medical diagnostics.
The first contribution of this thesis is the introduction of unified self-supervised representation learning for training medical foundation models. Conventional self-supervised learning methods focus on preserving high-level semantics or pixel-level information in representations, which is insufficient for medical applications. We propose Preservational Contrastive Representation Learning (PCRL), which integrates diverse information, i.e., high-level semantics, pixel-level information, and scale information, into a unified framework. By doing so, medical foundation models can encode diverse information that benefits various downstream tasks, even with limited training data. Experimental results demonstrate that PCRL significantly improves transfer learning performance in radiological tasks, outperforming non-transfer counterparts by large margins.
The second contribution explores knowledge-enhanced representation learning by injecting medical knowledge into foundation models. We propose REviewing FreE-text Reports for Supervision (REFERS), a framework that leverages supervision signals from associated free-text radiology reports to learn radiograph representations. REFERS surpasses label-supervised pre-training and outperforms self-supervised and transfer learning counterparts in various radiological tasks. Furthermore, we introduce Masked Record Modeling (MRM), augmenting REFERS with multimodal supervision from imaging and textual data. MRM achieves impressive results in CheXpert, achieving comparable performance to fully labeled data with only 1% of it.
The third contribution focuses on developing multimodal AI models for clinical diagnostics. We present IRENE, a transformer-based model that processes multimodal clinical information for pulmonary disease identification. IRENE learns holistic representations from various clinical data, eliminating separate paths for learning modality-specific features. Experimental results show that IRENE outperforms image-only and non-unified diagnostic models by significant and large margins. To address a broader range of health issues, we propose GeMini, a generalist foundation model capable of generating diagnostic decisions for nearly one thousand problems using various clinical data modalities. GeMini significantly outperforms conventional multimodal solutions, demonstrating the importance of multimodal integration in clinical diagnostics.
This thesis emphasizes the need for transferable representations from limited data, the injection of external medical knowledge, and the integration of diverse modalities. The proposed methods and models demonstrate improved performance in a range of medical tasks, paving the way for further advancements in medical AI. The findings of this research contribute to the development of foundation models specifically designed for medical applications, facilitating accurate diagnoses, personalized treatment recommendations, and improved patient outcomes. |
Degree | Doctor of Philosophy |
Subject | Artificial intelligence - Medical applications |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/335126 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhou, Hongyu | - |
dc.contributor.author | 周洪宇 | - |
dc.date.accessioned | 2023-11-13T07:44:44Z | - |
dc.date.available | 2023-11-13T07:44:44Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Zhou, H. [周洪宇]. (2023). Exploring pathways to medical foundation models. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/335126 | - |
dc.description.abstract | Foundation models have revolutionized the fields of computer vision and natural language processing, but their impact on medicine remains limited. This thesis aims to explore and develop foundation models specifically tailored for medical applications, addressing the unique challenges and requirements of the healthcare domain. The motivation behind this research stems from the limited availability of training data, the need for domain-specific knowledge, and the importance of multimodal integration in medical diagnostics. The first contribution of this thesis is the introduction of unified self-supervised representation learning for training medical foundation models. Conventional self-supervised learning methods focus on preserving high-level semantics or pixel-level information in representations, which is insufficient for medical applications. We propose Preservational Contrastive Representation Learning (PCRL), which integrates diverse information, i.e., high-level semantics, pixel-level information, and scale information, into a unified framework. By doing so, medical foundation models can encode diverse information that benefits various downstream tasks, even with limited training data. Experimental results demonstrate that PCRL significantly improves transfer learning performance in radiological tasks, outperforming non-transfer counterparts by large margins. The second contribution explores knowledge-enhanced representation learning by injecting medical knowledge into foundation models. We propose REviewing FreE-text Reports for Supervision (REFERS), a framework that leverages supervision signals from associated free-text radiology reports to learn radiograph representations. REFERS surpasses label-supervised pre-training and outperforms self-supervised and transfer learning counterparts in various radiological tasks. Furthermore, we introduce Masked Record Modeling (MRM), augmenting REFERS with multimodal supervision from imaging and textual data. MRM achieves impressive results in CheXpert, achieving comparable performance to fully labeled data with only 1% of it. The third contribution focuses on developing multimodal AI models for clinical diagnostics. We present IRENE, a transformer-based model that processes multimodal clinical information for pulmonary disease identification. IRENE learns holistic representations from various clinical data, eliminating separate paths for learning modality-specific features. Experimental results show that IRENE outperforms image-only and non-unified diagnostic models by significant and large margins. To address a broader range of health issues, we propose GeMini, a generalist foundation model capable of generating diagnostic decisions for nearly one thousand problems using various clinical data modalities. GeMini significantly outperforms conventional multimodal solutions, demonstrating the importance of multimodal integration in clinical diagnostics. This thesis emphasizes the need for transferable representations from limited data, the injection of external medical knowledge, and the integration of diverse modalities. The proposed methods and models demonstrate improved performance in a range of medical tasks, paving the way for further advancements in medical AI. The findings of this research contribute to the development of foundation models specifically designed for medical applications, facilitating accurate diagnoses, personalized treatment recommendations, and improved patient outcomes. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Artificial intelligence - Medical applications | - |
dc.title | Exploring pathways to medical foundation models | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044736607703414 | - |