File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/3460426.3463584
- Scopus: eid_2-s2.0-85113541872
- WOS: WOS:000723651900053
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering
Title | Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering |
---|---|
Authors | |
Keywords | Visual question answering transfer learning multi-task learning self-attention |
Issue Date | 2021 |
Publisher | Association for Computing Machinery. |
Citation | Proceedings of the 2021 International Conference on Multimedia Retrieval (ICMR-21), Virtual Conference, Taipei, Taiwan, 16-19 November 2021, p. 456-460 How to Cite? |
Abstract | Due to the severe lack of labeled data, existing methods of medical visual question answering usually rely on transfer learning to obtain effective image feature representation and use cross-modal fusion
of visual and linguistic features to achieve question-related answer prediction. These two phases are performed independently and without considering the compatibility and applicability of the pretrained features for cross-modal fusion. Thus, we reformulate image feature pre-training as a multi-task learning paradigm and witness its extraordinary superiority, forcing it to take into account the applicability of features for the specific image comprehension task. Furthermore, we introduce a cross-modal self-attention (CMSA) module to selectively capture the long-range contextual relevance for more effective fusion of visual and linguistic features. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art methods. Our code and models are available at https://github.com/haifangong/CMSA-MTPT-4-MedicalVQA. |
Description | The conference dates of ICMR 2021 was postponed from 21-24 August to 16-19 November 2021, due to the changing dynamics of the COVID-19 pandemic. |
Persistent Identifier | http://hdl.handle.net/10722/301301 |
ISBN | |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Gong, H | - |
dc.contributor.author | Chen, G | - |
dc.contributor.author | Liu, S | - |
dc.contributor.author | Yu, Y | - |
dc.contributor.author | Li, G | - |
dc.date.accessioned | 2021-07-27T08:09:06Z | - |
dc.date.available | 2021-07-27T08:09:06Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | Proceedings of the 2021 International Conference on Multimedia Retrieval (ICMR-21), Virtual Conference, Taipei, Taiwan, 16-19 November 2021, p. 456-460 | - |
dc.identifier.isbn | 9781450384636 | - |
dc.identifier.uri | http://hdl.handle.net/10722/301301 | - |
dc.description | The conference dates of ICMR 2021 was postponed from 21-24 August to 16-19 November 2021, due to the changing dynamics of the COVID-19 pandemic. | - |
dc.description.abstract | Due to the severe lack of labeled data, existing methods of medical visual question answering usually rely on transfer learning to obtain effective image feature representation and use cross-modal fusion of visual and linguistic features to achieve question-related answer prediction. These two phases are performed independently and without considering the compatibility and applicability of the pretrained features for cross-modal fusion. Thus, we reformulate image feature pre-training as a multi-task learning paradigm and witness its extraordinary superiority, forcing it to take into account the applicability of features for the specific image comprehension task. Furthermore, we introduce a cross-modal self-attention (CMSA) module to selectively capture the long-range contextual relevance for more effective fusion of visual and linguistic features. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art methods. Our code and models are available at https://github.com/haifangong/CMSA-MTPT-4-MedicalVQA. | - |
dc.language | eng | - |
dc.publisher | Association for Computing Machinery. | - |
dc.relation.ispartof | International Conference on Multimedia Retrieval (ICMR), 2021 | - |
dc.rights | International Conference on Multimedia Retrieval (ICMR), 2021. Copyright © Association for Computing Machinery. | - |
dc.subject | Visual question answering | - |
dc.subject | transfer learning | - |
dc.subject | multi-task learning | - |
dc.subject | self-attention | - |
dc.title | Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Liu, S: sishuo@hku.hk | - |
dc.identifier.email | Yu, Y: yzyu@cs.hku.hk | - |
dc.identifier.authority | Yu, Y=rp01415 | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1145/3460426.3463584 | - |
dc.identifier.scopus | eid_2-s2.0-85113541872 | - |
dc.identifier.hkuros | 323546 | - |
dc.identifier.spage | 456 | - |
dc.identifier.epage | 460 | - |
dc.identifier.isi | WOS:000723651900053 | - |
dc.publisher.place | United States | - |