Exploring pathways to medical foundation models

Zhou, Hongyu; 周洪宇

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Exploring pathways to medical foundation models

Title	Exploring pathways to medical foundation models
Authors	Zhou, Hongyu 周洪宇
Issue Date	2023
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Zhou, H. [周洪宇]. (2023). Exploring pathways to medical foundation models. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Foundation models have revolutionized the fields of computer vision and natural language processing, but their impact on medicine remains limited. This thesis aims to explore and develop foundation models specifically tailored for medical applications, addressing the unique challenges and requirements of the healthcare domain. The motivation behind this research stems from the limited availability of training data, the need for domain-specific knowledge, and the importance of multimodal integration in medical diagnostics. The first contribution of this thesis is the introduction of unified self-supervised representation learning for training medical foundation models. Conventional self-supervised learning methods focus on preserving high-level semantics or pixel-level information in representations, which is insufficient for medical applications. We propose Preservational Contrastive Representation Learning (PCRL), which integrates diverse information, i.e., high-level semantics, pixel-level information, and scale information, into a unified framework. By doing so, medical foundation models can encode diverse information that benefits various downstream tasks, even with limited training data. Experimental results demonstrate that PCRL significantly improves transfer learning performance in radiological tasks, outperforming non-transfer counterparts by large margins. The second contribution explores knowledge-enhanced representation learning by injecting medical knowledge into foundation models. We propose REviewing FreE-text Reports for Supervision (REFERS), a framework that leverages supervision signals from associated free-text radiology reports to learn radiograph representations. REFERS surpasses label-supervised pre-training and outperforms self-supervised and transfer learning counterparts in various radiological tasks. Furthermore, we introduce Masked Record Modeling (MRM), augmenting REFERS with multimodal supervision from imaging and textual data. MRM achieves impressive results in CheXpert, achieving comparable performance to fully labeled data with only 1% of it. The third contribution focuses on developing multimodal AI models for clinical diagnostics. We present IRENE, a transformer-based model that processes multimodal clinical information for pulmonary disease identification. IRENE learns holistic representations from various clinical data, eliminating separate paths for learning modality-specific features. Experimental results show that IRENE outperforms image-only and non-unified diagnostic models by significant and large margins. To address a broader range of health issues, we propose GeMini, a generalist foundation model capable of generating diagnostic decisions for nearly one thousand problems using various clinical data modalities. GeMini significantly outperforms conventional multimodal solutions, demonstrating the importance of multimodal integration in clinical diagnostics. This thesis emphasizes the need for transferable representations from limited data, the injection of external medical knowledge, and the integration of diverse modalities. The proposed methods and models demonstrate improved performance in a range of medical tasks, paving the way for further advancements in medical AI. The findings of this research contribute to the development of foundation models specifically designed for medical applications, facilitating accurate diagnoses, personalized treatment recommendations, and improved patient outcomes.
Degree	Doctor of Philosophy
Subject	Artificial intelligence - Medical applications
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/335126

DC Field	Value	Language
dc.contributor.author	Zhou, Hongyu	-
dc.contributor.author	周洪宇	-
dc.date.accessioned	2023-11-13T07:44:44Z	-
dc.date.available	2023-11-13T07:44:44Z	-
dc.date.issued	2023	-
dc.identifier.citation	Zhou, H. [周洪宇]. (2023). Exploring pathways to medical foundation models. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/335126	-
dc.description.abstract	Foundation models have revolutionized the fields of computer vision and natural language processing, but their impact on medicine remains limited. This thesis aims to explore and develop foundation models specifically tailored for medical applications, addressing the unique challenges and requirements of the healthcare domain. The motivation behind this research stems from the limited availability of training data, the need for domain-specific knowledge, and the importance of multimodal integration in medical diagnostics. The first contribution of this thesis is the introduction of unified self-supervised representation learning for training medical foundation models. Conventional self-supervised learning methods focus on preserving high-level semantics or pixel-level information in representations, which is insufficient for medical applications. We propose Preservational Contrastive Representation Learning (PCRL), which integrates diverse information, i.e., high-level semantics, pixel-level information, and scale information, into a unified framework. By doing so, medical foundation models can encode diverse information that benefits various downstream tasks, even with limited training data. Experimental results demonstrate that PCRL significantly improves transfer learning performance in radiological tasks, outperforming non-transfer counterparts by large margins. The second contribution explores knowledge-enhanced representation learning by injecting medical knowledge into foundation models. We propose REviewing FreE-text Reports for Supervision (REFERS), a framework that leverages supervision signals from associated free-text radiology reports to learn radiograph representations. REFERS surpasses label-supervised pre-training and outperforms self-supervised and transfer learning counterparts in various radiological tasks. Furthermore, we introduce Masked Record Modeling (MRM), augmenting REFERS with multimodal supervision from imaging and textual data. MRM achieves impressive results in CheXpert, achieving comparable performance to fully labeled data with only 1% of it. The third contribution focuses on developing multimodal AI models for clinical diagnostics. We present IRENE, a transformer-based model that processes multimodal clinical information for pulmonary disease identification. IRENE learns holistic representations from various clinical data, eliminating separate paths for learning modality-specific features. Experimental results show that IRENE outperforms image-only and non-unified diagnostic models by significant and large margins. To address a broader range of health issues, we propose GeMini, a generalist foundation model capable of generating diagnostic decisions for nearly one thousand problems using various clinical data modalities. GeMini significantly outperforms conventional multimodal solutions, demonstrating the importance of multimodal integration in clinical diagnostics. This thesis emphasizes the need for transferable representations from limited data, the injection of external medical knowledge, and the integration of diverse modalities. The proposed methods and models demonstrate improved performance in a range of medical tasks, paving the way for further advancements in medical AI. The findings of this research contribute to the development of foundation models specifically designed for medical applications, facilitating accurate diagnoses, personalized treatment recommendations, and improved patient outcomes.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Artificial intelligence - Medical applications	-
dc.title	Exploring pathways to medical foundation models	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2024	-
dc.identifier.mmsid	991044736607703414	-

File Download

Supplementary

postgraduate thesis: Exploring pathways to medical foundation models

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats