File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Shared-Specific Feature Learning With Bottleneck Fusion Transformer for Multi-Modal Whole Slide Image Analysis

TitleShared-Specific Feature Learning With Bottleneck Fusion Transformer for Multi-Modal Whole Slide Image Analysis
Authors
Keywordsknowledge transfer
multi-modal multi-instance learning
transformer
Whole slide image
Issue Date1-Nov-2023
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Medical Imaging, 2023, v. 42, n. 11, p. 3374-3383 How to Cite?
Abstract

The fusion of multi-modal medical data is essential to assist medical experts to make treatment decisions for precision medicine. For example, combining the whole slide histopathological images (WSIs) and tabular clinical data can more accurately predict the lymph node metastasis (LNM) of papillary thyroid carcinoma before surgery to avoid unnecessary lymph node resection. However, the huge-sized WSI provides much more high-dimensional information than low-dimensional tabular clinical data, making the information alignment challenging in the multi-modal WSI analysis tasks. This paper presents a novel transformer-guided multi-modal multi-instance learning framework to predict lymph node metastasis from both WSIs and tabular clinical data. We first propose an effective multi-instance grouping scheme, named siamese attention-based feature grouping (SAG), to group high-dimensional WSIs into representative low-dimensional feature embeddings for fusion. We then design a novel bottleneck shared-specific feature transfer module (BSFT) to explore the shared and specific features between different modalities, where a few learnable bottleneck tokens are utilized for knowledge transfer between modalities. Moreover, a modal adaptation and orthogonal projection scheme were incorporated to further encourage BSFT to learn shared and specific features from multi-modal data. Finally, the shared and specific features are dynamically aggregated via an attention mechanism for slide-level prediction. Experimental results on our collected lymph node metastasis dataset demonstrate the efficiency of our proposed components and our framework achieves the best performance with AUC (area under the curve) of 97.34%, outperforming the state-of-the-art methods by over 1.27%.


Persistent Identifierhttp://hdl.handle.net/10722/340954
ISSN
2021 Impact Factor: 11.037
2020 SCImago Journal Rankings: 2.322
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorWang, ZH-
dc.contributor.authorYu, LQ-
dc.contributor.authorDing, X-
dc.contributor.authorLiao, XH-
dc.contributor.authorWang, LS -
dc.date.accessioned2024-03-11T10:48:33Z-
dc.date.available2024-03-11T10:48:33Z-
dc.date.issued2023-11-01-
dc.identifier.citationIEEE Transactions on Medical Imaging, 2023, v. 42, n. 11, p. 3374-3383-
dc.identifier.issn0278-0062-
dc.identifier.urihttp://hdl.handle.net/10722/340954-
dc.description.abstract<p>The fusion of multi-modal medical data is essential to assist medical experts to make treatment decisions for precision medicine. For example, combining the whole slide histopathological images (WSIs) and tabular clinical data can more accurately predict the lymph node metastasis (LNM) of papillary thyroid carcinoma before surgery to avoid unnecessary lymph node resection. However, the huge-sized WSI provides much more high-dimensional information than low-dimensional tabular clinical data, making the information alignment challenging in the multi-modal WSI analysis tasks. This paper presents a novel transformer-guided multi-modal multi-instance learning framework to predict lymph node metastasis from both WSIs and tabular clinical data. We first propose an effective multi-instance grouping scheme, named siamese attention-based feature grouping (SAG), to group high-dimensional WSIs into representative low-dimensional feature embeddings for fusion. We then design a novel bottleneck shared-specific feature transfer module (BSFT) to explore the shared and specific features between different modalities, where a few learnable bottleneck tokens are utilized for knowledge transfer between modalities. Moreover, a modal adaptation and orthogonal projection scheme were incorporated to further encourage BSFT to learn shared and specific features from multi-modal data. Finally, the shared and specific features are dynamically aggregated via an attention mechanism for slide-level prediction. Experimental results on our collected lymph node metastasis dataset demonstrate the efficiency of our proposed components and our framework achieves the best performance with AUC (area under the curve) of 97.34%, outperforming the state-of-the-art methods by over 1.27%.</p>-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Medical Imaging-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectknowledge transfer-
dc.subjectmulti-modal multi-instance learning-
dc.subjecttransformer-
dc.subjectWhole slide image-
dc.titleShared-Specific Feature Learning With Bottleneck Fusion Transformer for Multi-Modal Whole Slide Image Analysis-
dc.typeArticle-
dc.identifier.doi10.1109/TMI.2023.3287256-
dc.identifier.pmid37335798-
dc.identifier.scopuseid_2-s2.0-85162901224-
dc.identifier.volume42-
dc.identifier.issue11-
dc.identifier.spage3374-
dc.identifier.epage3383-
dc.identifier.eissn1558-254X-
dc.identifier.isiWOS:001099088700016-
dc.publisher.placePISCATAWAY-
dc.identifier.issnl0278-0062-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats