Deep learning based medical image segmentation and visual question answering

Liu, Sishuo; 劉思鑠

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Deep learning based medical image segmentation and visual question answering

Title	Deep learning based medical image segmentation and visual question answering
Authors	Liu, Sishuo 劉思鑠
Issue Date	2023
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Liu, S. [劉思鑠]. (2023). Deep learning based medical image segmentation and visual question answering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	This work proposes methods for 3D/2D medical image segmentation and medical visual question answering. For 2D segmentation, we develop a self-learning correction paradigm for semisupervised biomedical image segmentation. Our coarse-to-fine strategy adopts lesion inpainting as a self-supervised pretext task for unlabeled data, which enhances network representations for improved segmentation. This approach leverages unlabeled data to derive additional supervision signals that guide the network to learn better feature representations. For 3D segmentation, we develop 3D UNeXt, a hybrid CNN-MLP network based on U-Net. 3D UNeXt balances complexity and performance by combining convolutional local feature extraction with MLP blocks for global context propagation. The convolutional layers capture low-level local features while the MLP blocks model long-range dependencies, enabling more effective 3D segmentation. For medical visual question answering, we introduce a cross-modal self-attention module to selectively capture long-range visual-linguistic contextual relevance for effective feature fusion. More importantly, we reformulate image feature pre-training as a multi-task learning paradigm to make features more applicable for multimodal fusion and question answering. This multi-task pre-training enables image features to encode richer contextual clues that are beneficial for visual question answering. In summary, we propose 2D/3D segmentation methods and multi-task pre-training with cross-modal self-attention for medical visual question answering. Our techniques achieve superior performance while balancing model complexity, demonstrating their potential to improve segmentation, 3D analysis and multimodal learning for medical imaging applications. The proposed frameworks represent steps towards more comprehensive computational models for medical image analysis.
Degree	Doctor of Philosophy
Subject	Deep learning (Machine learning) Information visualization Natural language processing (Computer science)
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/335164

DC Field	Value	Language
dc.contributor.author	Liu, Sishuo	-
dc.contributor.author	劉思鑠	-
dc.date.accessioned	2023-11-13T07:45:06Z	-
dc.date.available	2023-11-13T07:45:06Z	-
dc.date.issued	2023	-
dc.identifier.citation	Liu, S. [劉思鑠]. (2023). Deep learning based medical image segmentation and visual question answering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/335164	-
dc.description.abstract	This work proposes methods for 3D/2D medical image segmentation and medical visual question answering. For 2D segmentation, we develop a self-learning correction paradigm for semisupervised biomedical image segmentation. Our coarse-to-fine strategy adopts lesion inpainting as a self-supervised pretext task for unlabeled data, which enhances network representations for improved segmentation. This approach leverages unlabeled data to derive additional supervision signals that guide the network to learn better feature representations. For 3D segmentation, we develop 3D UNeXt, a hybrid CNN-MLP network based on U-Net. 3D UNeXt balances complexity and performance by combining convolutional local feature extraction with MLP blocks for global context propagation. The convolutional layers capture low-level local features while the MLP blocks model long-range dependencies, enabling more effective 3D segmentation. For medical visual question answering, we introduce a cross-modal self-attention module to selectively capture long-range visual-linguistic contextual relevance for effective feature fusion. More importantly, we reformulate image feature pre-training as a multi-task learning paradigm to make features more applicable for multimodal fusion and question answering. This multi-task pre-training enables image features to encode richer contextual clues that are beneficial for visual question answering. In summary, we propose 2D/3D segmentation methods and multi-task pre-training with cross-modal self-attention for medical visual question answering. Our techniques achieve superior performance while balancing model complexity, demonstrating their potential to improve segmentation, 3D analysis and multimodal learning for medical imaging applications. The proposed frameworks represent steps towards more comprehensive computational models for medical image analysis.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Deep learning (Machine learning)	-
dc.subject.lcsh	Information visualization	-
dc.subject.lcsh	Natural language processing (Computer science)	-
dc.title	Deep learning based medical image segmentation and visual question answering	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2023	-
dc.identifier.mmsid	991044736500003414	-

File Download

Supplementary

postgraduate thesis: Deep learning based medical image segmentation and visual question answering

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats