File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/JBHI.2024.3360239
- Scopus: eid_2-s2.0-85184315889
- PMID: 38289846
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: Hybrid Masked Image Modeling for 3D Medical Image Segmentation
Title | Hybrid Masked Image Modeling for 3D Medical Image Segmentation |
---|---|
Authors | |
Keywords | 3D medical image segmentation Masked image modeling Self-supervised learning |
Issue Date | 1-Apr-2024 |
Publisher | Institute of Electrical and Electronics Engineers |
Citation | IEEE Journal of Biomedical and Health Informatics, 2024, v. 28, n. 4, p. 2115-2125 How to Cite? |
Abstract | Masked image modeling (MIM) with transformer backbones has recently been exploited as a powerful self-supervised pre-training technique. The existing MIM methods adopt the strategy to mask random patches of the image and reconstruct the missing pixels, which only considers semantic information at a lower level, and causes a long pre-training time. This paper presents HybridMIM, a novel hybrid self-supervised learning method based on masked image modeling for 3D medical image segmentation. Specifically, we design a two-level masking hierarchy to specify which and how patches in sub-volumes are masked, effectively providing the constraints of higher level semantic information. Then we learn the semantic information of medical images at three levels, including: 1) partial region prediction to reconstruct key contents of the 3D image, which largely reduces the pre-training time burden (pixel-level); 2) patch-masking perception to learn the spatial relationship between the patches in each sub-volume (region-level); and 3) drop-out-based contrastive learning between samples within a mini-batch, which further improves the generalization ability of the framework (sample-level). The proposed framework is versatile to support both CNN and transformer as encoder backbones, and also enables to pre-train decoders for image segmentation. We conduct comprehensive experiments on five widely-used public medical image segmentation datasets, including BraTS2020, BTCV, MSD Liver, MSD Spleen, and BraTS2023. The experimental results show the clear superiority of HybridMIM against competing supervised methods, masked pre-training approaches, and other self-supervised methods, in terms of quantitative metrics, speed performance and qualitative observations. The codes of HybridMIM are available at https://github.com/ge-xing/HybridMIM. |
Persistent Identifier | http://hdl.handle.net/10722/345591 |
ISSN | 2023 Impact Factor: 6.7 2023 SCImago Journal Rankings: 1.964 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Xing, Zhaohu | - |
dc.contributor.author | Zhu, Lei | - |
dc.contributor.author | Yu, Lequan | - |
dc.contributor.author | Xing, Zhiheng | - |
dc.contributor.author | Wan, Liang | - |
dc.date.accessioned | 2024-08-27T09:09:52Z | - |
dc.date.available | 2024-08-27T09:09:52Z | - |
dc.date.issued | 2024-04-01 | - |
dc.identifier.citation | IEEE Journal of Biomedical and Health Informatics, 2024, v. 28, n. 4, p. 2115-2125 | - |
dc.identifier.issn | 2168-2194 | - |
dc.identifier.uri | http://hdl.handle.net/10722/345591 | - |
dc.description.abstract | Masked image modeling (MIM) with transformer backbones has recently been exploited as a powerful self-supervised pre-training technique. The existing MIM methods adopt the strategy to mask random patches of the image and reconstruct the missing pixels, which only considers semantic information at a lower level, and causes a long pre-training time. This paper presents HybridMIM, a novel hybrid self-supervised learning method based on masked image modeling for 3D medical image segmentation. Specifically, we design a two-level masking hierarchy to specify which and how patches in sub-volumes are masked, effectively providing the constraints of higher level semantic information. Then we learn the semantic information of medical images at three levels, including: 1) partial region prediction to reconstruct key contents of the 3D image, which largely reduces the pre-training time burden (pixel-level); 2) patch-masking perception to learn the spatial relationship between the patches in each sub-volume (region-level); and 3) drop-out-based contrastive learning between samples within a mini-batch, which further improves the generalization ability of the framework (sample-level). The proposed framework is versatile to support both CNN and transformer as encoder backbones, and also enables to pre-train decoders for image segmentation. We conduct comprehensive experiments on five widely-used public medical image segmentation datasets, including BraTS2020, BTCV, MSD Liver, MSD Spleen, and BraTS2023. The experimental results show the clear superiority of HybridMIM against competing supervised methods, masked pre-training approaches, and other self-supervised methods, in terms of quantitative metrics, speed performance and qualitative observations. The codes of HybridMIM are available at https://github.com/ge-xing/HybridMIM. | - |
dc.language | eng | - |
dc.publisher | Institute of Electrical and Electronics Engineers | - |
dc.relation.ispartof | IEEE Journal of Biomedical and Health Informatics | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject | 3D medical image segmentation | - |
dc.subject | Masked image modeling | - |
dc.subject | Self-supervised learning | - |
dc.title | Hybrid Masked Image Modeling for 3D Medical Image Segmentation | - |
dc.type | Article | - |
dc.identifier.doi | 10.1109/JBHI.2024.3360239 | - |
dc.identifier.pmid | 38289846 | - |
dc.identifier.scopus | eid_2-s2.0-85184315889 | - |
dc.identifier.volume | 28 | - |
dc.identifier.issue | 4 | - |
dc.identifier.spage | 2115 | - |
dc.identifier.epage | 2125 | - |
dc.identifier.eissn | 2168-2208 | - |
dc.identifier.issnl | 2168-2194 | - |