File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Intra- and Inter-Head Orthogonal Attention for Image Captioning

TitleIntra- and Inter-Head Orthogonal Attention for Image Captioning
Authors
Issue Date16-Jan-2025
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Image Processing, 2025, v. 34, p. 594-607 How to Cite?
Abstract

Multi-head attention (MA), which allows the model to jointly attend to crucial information from diverse representation subspaces through its heads, has yielded remarkable achievement in image captioning. However, there is no explicit mechanism to ensure MA attends to appropriate positions in diverse subspaces, resulting in overfocused attention for each head and redundancy between heads. In this paper, we propose a novel Intra- and Inter-Head Orthogonal Attention (I2OA) to efficiently improve MA in image captioning by introducing a concise orthogonal regularization to heads. Specifically, Intra-Head Orthogonal Attention enhances the attention learning of MA by introducing orthogonal constraint to each head, which decentralizes the object-centric attention to more comprehensive content-aware attention. Inter-Head Orthogonal Attention reduces the heads redundancy by applying orthogonal constraint between heads, which enlarges the diversity of representation subspaces and improves the representation ability for MA. Moreover, the proposed I2OA is flexible to combine with various multi-head attention based image captioning methods and improve the performances without increasing model complexity and parameters. Experiments on the MS COCO dataset demonstrate the effectiveness of the proposed model.


Persistent Identifierhttp://hdl.handle.net/10722/357522
ISSN
2023 Impact Factor: 10.8
2023 SCImago Journal Rankings: 3.556
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorZhang, Xiaodan-
dc.contributor.authorJia, Aozhe-
dc.contributor.authorJi, Junzhong-
dc.contributor.authorQu, Liangqiong-
dc.contributor.authorYe, Qixiang-
dc.date.accessioned2025-07-22T03:13:16Z-
dc.date.available2025-07-22T03:13:16Z-
dc.date.issued2025-01-16-
dc.identifier.citationIEEE Transactions on Image Processing, 2025, v. 34, p. 594-607-
dc.identifier.issn1057-7149-
dc.identifier.urihttp://hdl.handle.net/10722/357522-
dc.description.abstract<p>Multi-head attention (MA), which allows the model to jointly attend to crucial information from diverse representation subspaces through its heads, has yielded remarkable achievement in image captioning. However, there is no explicit mechanism to ensure MA attends to appropriate positions in diverse subspaces, resulting in overfocused attention for each head and redundancy between heads. In this paper, we propose a novel Intra- and Inter-Head Orthogonal Attention (I2OA) to efficiently improve MA in image captioning by introducing a concise orthogonal regularization to heads. Specifically, Intra-Head Orthogonal Attention enhances the attention learning of MA by introducing orthogonal constraint to each head, which decentralizes the object-centric attention to more comprehensive content-aware attention. Inter-Head Orthogonal Attention reduces the heads redundancy by applying orthogonal constraint between heads, which enlarges the diversity of representation subspaces and improves the representation ability for MA. Moreover, the proposed I2OA is flexible to combine with various multi-head attention based image captioning methods and improve the performances without increasing model complexity and parameters. Experiments on the MS COCO dataset demonstrate the effectiveness of the proposed model.</p>-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Image Processing-
dc.titleIntra- and Inter-Head Orthogonal Attention for Image Captioning-
dc.typeArticle-
dc.identifier.doi10.1109/TIP.2025.3528216-
dc.identifier.scopuseid_2-s2.0-85216032404-
dc.identifier.volume34-
dc.identifier.spage594-
dc.identifier.epage607-
dc.identifier.eissn1941-0042-
dc.identifier.isiWOS:001410171000005-
dc.identifier.issnl1057-7149-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats