File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: AS-MLP: AN AXIAL SHIFTED MLP ARCHITECTURE FOR VISION

TitleAS-MLP: AN AXIAL SHIFTED MLP ARCHITECTURE FOR VISION
Authors
Issue Date2022
Citation
ICLR 2022 - 10th International Conference on Learning Representations, 2022 How to Cite?
AbstractAn Axial Shifted MLP architecture (AS-MLP) is proposed in this paper. Different from MLP-Mixer, where the global spatial feature is encoded for information flow through matrix transposition and one token-mixing MLP, we pay more attention to the local features interaction. By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different axial directions, which captures the local dependencies. Such an operation enables us to utilize a pure MLP architecture to achieve the same local receptive field as CNN-like architecture. We can also design the receptive field size and dilation of blocks of AS-MLP, etc, in the same spirit of convolutional neural networks. With the proposed AS-MLP architecture, our model obtains 83.3% Top-1 accuracy with 88M parameters and 15.2 GFLOPs on the ImageNet-1K dataset. Such a simple yet effective architecture outperforms all MLP-based architectures and achieves competitive performance compared to the transformer-based architectures (e.g., Swin Transformer) even with slightly lower FLOPs. In addition, AS-MLP is also the first MLP-based architecture to be applied to the downstream tasks (e.g., object detection and semantic segmentation). The experimental results are also impressive. Our proposed AS-MLP obtains 51.5 mAP on the COCO validation set and 49.5 MS mIoU on the ADE20K dataset, which is competitive compared to the transformer-based architectures. Our AS-MLP establishes a strong baseline of MLP-based architecture. Code is available at https://github.com/svip-lab/AS-MLP.
Persistent Identifierhttp://hdl.handle.net/10722/345308

 

DC FieldValueLanguage
dc.contributor.authorLian, Dongze-
dc.contributor.authorYu, Zehao-
dc.contributor.authorSun, Xing-
dc.contributor.authorGao, Shenghua-
dc.date.accessioned2024-08-15T09:26:32Z-
dc.date.available2024-08-15T09:26:32Z-
dc.date.issued2022-
dc.identifier.citationICLR 2022 - 10th International Conference on Learning Representations, 2022-
dc.identifier.urihttp://hdl.handle.net/10722/345308-
dc.description.abstractAn Axial Shifted MLP architecture (AS-MLP) is proposed in this paper. Different from MLP-Mixer, where the global spatial feature is encoded for information flow through matrix transposition and one token-mixing MLP, we pay more attention to the local features interaction. By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different axial directions, which captures the local dependencies. Such an operation enables us to utilize a pure MLP architecture to achieve the same local receptive field as CNN-like architecture. We can also design the receptive field size and dilation of blocks of AS-MLP, etc, in the same spirit of convolutional neural networks. With the proposed AS-MLP architecture, our model obtains 83.3% Top-1 accuracy with 88M parameters and 15.2 GFLOPs on the ImageNet-1K dataset. Such a simple yet effective architecture outperforms all MLP-based architectures and achieves competitive performance compared to the transformer-based architectures (e.g., Swin Transformer) even with slightly lower FLOPs. In addition, AS-MLP is also the first MLP-based architecture to be applied to the downstream tasks (e.g., object detection and semantic segmentation). The experimental results are also impressive. Our proposed AS-MLP obtains 51.5 mAP on the COCO validation set and 49.5 MS mIoU on the ADE20K dataset, which is competitive compared to the transformer-based architectures. Our AS-MLP establishes a strong baseline of MLP-based architecture. Code is available at https://github.com/svip-lab/AS-MLP.-
dc.languageeng-
dc.relation.ispartofICLR 2022 - 10th International Conference on Learning Representations-
dc.titleAS-MLP: AN AXIAL SHIFTED MLP ARCHITECTURE FOR VISION-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.scopuseid_2-s2.0-85148322296-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats