File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Deep learning based medical image segmentation and visual question answering
Title | Deep learning based medical image segmentation and visual question answering |
---|---|
Authors | |
Issue Date | 2023 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Liu, S. [劉思鑠]. (2023). Deep learning based medical image segmentation and visual question answering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | This work proposes methods for 3D/2D medical image segmentation and medical visual question answering.
For 2D segmentation, we develop a self-learning correction paradigm for semisupervised biomedical image segmentation. Our coarse-to-fine strategy adopts lesion inpainting as a self-supervised pretext task for unlabeled data, which enhances network representations for improved segmentation. This approach leverages unlabeled data to derive additional supervision signals that guide the network to learn better feature representations.
For 3D segmentation, we develop 3D UNeXt, a hybrid CNN-MLP network based on U-Net. 3D UNeXt balances complexity and performance by combining convolutional local feature extraction with MLP blocks for global context propagation. The convolutional layers capture low-level local features while the MLP blocks model long-range dependencies, enabling more effective 3D segmentation.
For medical visual question answering, we introduce a cross-modal self-attention module to selectively capture long-range visual-linguistic contextual relevance for effective feature fusion. More importantly, we reformulate image feature pre-training as a multi-task learning paradigm to make features more applicable for multimodal fusion and question answering. This multi-task pre-training enables image features to encode richer contextual clues that are beneficial for visual question answering.
In summary, we propose 2D/3D segmentation methods and multi-task pre-training with cross-modal self-attention for medical visual question answering. Our techniques achieve superior performance while balancing model complexity, demonstrating their potential to improve segmentation, 3D analysis and multimodal learning for medical imaging applications. The proposed frameworks represent steps towards more comprehensive computational models for medical image analysis. |
Degree | Doctor of Philosophy |
Subject | Deep learning (Machine learning) Information visualization Natural language processing (Computer science) |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/335164 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Liu, Sishuo | - |
dc.contributor.author | 劉思鑠 | - |
dc.date.accessioned | 2023-11-13T07:45:06Z | - |
dc.date.available | 2023-11-13T07:45:06Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Liu, S. [劉思鑠]. (2023). Deep learning based medical image segmentation and visual question answering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/335164 | - |
dc.description.abstract | This work proposes methods for 3D/2D medical image segmentation and medical visual question answering. For 2D segmentation, we develop a self-learning correction paradigm for semisupervised biomedical image segmentation. Our coarse-to-fine strategy adopts lesion inpainting as a self-supervised pretext task for unlabeled data, which enhances network representations for improved segmentation. This approach leverages unlabeled data to derive additional supervision signals that guide the network to learn better feature representations. For 3D segmentation, we develop 3D UNeXt, a hybrid CNN-MLP network based on U-Net. 3D UNeXt balances complexity and performance by combining convolutional local feature extraction with MLP blocks for global context propagation. The convolutional layers capture low-level local features while the MLP blocks model long-range dependencies, enabling more effective 3D segmentation. For medical visual question answering, we introduce a cross-modal self-attention module to selectively capture long-range visual-linguistic contextual relevance for effective feature fusion. More importantly, we reformulate image feature pre-training as a multi-task learning paradigm to make features more applicable for multimodal fusion and question answering. This multi-task pre-training enables image features to encode richer contextual clues that are beneficial for visual question answering. In summary, we propose 2D/3D segmentation methods and multi-task pre-training with cross-modal self-attention for medical visual question answering. Our techniques achieve superior performance while balancing model complexity, demonstrating their potential to improve segmentation, 3D analysis and multimodal learning for medical imaging applications. The proposed frameworks represent steps towards more comprehensive computational models for medical image analysis. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Deep learning (Machine learning) | - |
dc.subject.lcsh | Information visualization | - |
dc.subject.lcsh | Natural language processing (Computer science) | - |
dc.title | Deep learning based medical image segmentation and visual question answering | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2023 | - |
dc.identifier.mmsid | 991044736500003414 | - |