File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: SDPT: Semantic-Aware Dimension-Pooling Transformer for Image Segmentation

TitleSDPT: Semantic-Aware Dimension-Pooling Transformer for Image Segmentation
Authors
Keywordsdimension-pooling attention
Image segmentation
scene understanding
semantic-balanced decoder
vision transformer
Issue Date3-Jul-2024
PublisherIEEE
Citation
IEEE Transactions on Intelligence Transportation Systems, 2024, v. 25, n. 11, p. 15934-15946 How to Cite?
AbstractImage segmentation plays a critical role in autonomous driving by providing vehicles with a detailed and accurate understanding of their surroundings. Transformers have recently shown encouraging results in image segmentation. However, transformer-based models are challenging to strike a better balance between performance and efficiency. The computational complexity of the transformer-based models is quadratic with the number of inputs, which severely hinders their application in dense prediction tasks. In this paper, we present the semantic-aware dimension-pooling transformer (SDPT) to mitigate the conflict between accuracy and efficiency. The proposed model comprises an efficient transformer encoder for generating hierarchical features and a semantic-balanced decoder for predicting semantic masks. In the encoder, a dimension-pooling mechanism is used in the multi-head self-attention (MHSA) to reduce the computational cost, and a parallel depth-wise convolution is used to capture local semantics. Simultaneously, we further apply this dimension-pooling attention (DPA) to the decoder as a refinement module to integrate multi-level features. With such a simple yet powerful encoder-decoder framework, we empirically demonstrate that the proposed SDPT achieves excellent performance and efficiency on various popular benchmarks, including ADE20K, Cityscapes, and COCO-Stuff. For example, our SDPT achieves 48.6% mIOU on the ADE20K dataset, which outperforms the current methods with fewer computational costs. The codes can be found at https://github.com/HuCaoFighting/SDPT.
Persistent Identifierhttp://hdl.handle.net/10722/362075
ISSN
2023 Impact Factor: 7.9
2023 SCImago Journal Rankings: 2.580

 

DC FieldValueLanguage
dc.contributor.authorCao, Hu-
dc.contributor.authorChen, Guang-
dc.contributor.authorZhao, Hengshuang-
dc.contributor.authorJiang, Dongsheng-
dc.contributor.authorZhang, Xiaopeng-
dc.contributor.authorTian, Qi-
dc.contributor.authorKnoll, Alois-
dc.date.accessioned2025-09-19T00:31:39Z-
dc.date.available2025-09-19T00:31:39Z-
dc.date.issued2024-07-03-
dc.identifier.citationIEEE Transactions on Intelligence Transportation Systems, 2024, v. 25, n. 11, p. 15934-15946-
dc.identifier.issn1524-9050-
dc.identifier.urihttp://hdl.handle.net/10722/362075-
dc.description.abstractImage segmentation plays a critical role in autonomous driving by providing vehicles with a detailed and accurate understanding of their surroundings. Transformers have recently shown encouraging results in image segmentation. However, transformer-based models are challenging to strike a better balance between performance and efficiency. The computational complexity of the transformer-based models is quadratic with the number of inputs, which severely hinders their application in dense prediction tasks. In this paper, we present the semantic-aware dimension-pooling transformer (SDPT) to mitigate the conflict between accuracy and efficiency. The proposed model comprises an efficient transformer encoder for generating hierarchical features and a semantic-balanced decoder for predicting semantic masks. In the encoder, a dimension-pooling mechanism is used in the multi-head self-attention (MHSA) to reduce the computational cost, and a parallel depth-wise convolution is used to capture local semantics. Simultaneously, we further apply this dimension-pooling attention (DPA) to the decoder as a refinement module to integrate multi-level features. With such a simple yet powerful encoder-decoder framework, we empirically demonstrate that the proposed SDPT achieves excellent performance and efficiency on various popular benchmarks, including ADE20K, Cityscapes, and COCO-Stuff. For example, our SDPT achieves 48.6% mIOU on the ADE20K dataset, which outperforms the current methods with fewer computational costs. The codes can be found at https://github.com/HuCaoFighting/SDPT.-
dc.languageeng-
dc.publisherIEEE-
dc.relation.ispartofIEEE Transactions on Intelligence Transportation Systems-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectdimension-pooling attention-
dc.subjectImage segmentation-
dc.subjectscene understanding-
dc.subjectsemantic-balanced decoder-
dc.subjectvision transformer-
dc.titleSDPT: Semantic-Aware Dimension-Pooling Transformer for Image Segmentation-
dc.typeArticle-
dc.identifier.doi10.1109/TITS.2024.3417813-
dc.identifier.scopuseid_2-s2.0-85203146661-
dc.identifier.volume25-
dc.identifier.issue11-
dc.identifier.spage15934-
dc.identifier.epage15946-
dc.identifier.eissn1558-0016-
dc.identifier.issnl1524-9050-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats