QUICK REVIEW

[论文解读] AS-MLP: An Axial Shifted MLP Architecture for Vision

Dongze Lian, Zehao Yu|arXiv (Cornell University)|Jul 18, 2021

Advanced Neural Network Applications参考文献 47被引用 85

一句话总结

AS-MLP 在 MLP 框架中引入轴向通道位移以捕捉局部依赖，达到有竞争力的 ImageNet 性能并拓展到下游任务，如目标检测和分割。

ABSTRACT

An Axial Shifted MLP architecture (AS-MLP) is proposed in this paper. Different from MLP-Mixer, where the global spatial feature is encoded for information flow through matrix transposition and one token-mixing MLP, we pay more attention to the local features interaction. By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different axial directions, which captures the local dependencies. Such an operation enables us to utilize a pure MLP architecture to achieve the same local receptive field as CNN-like architecture. We can also design the receptive field size and dilation of blocks of AS-MLP, etc, in the same spirit of convolutional neural networks. With the proposed AS-MLP architecture, our model obtains 83.3% Top-1 accuracy with 88M parameters and 15.2 GFLOPs on the ImageNet-1K dataset. Such a simple yet effective architecture outperforms all MLP-based architectures and achieves competitive performance compared to the transformer-based architectures (e.g., Swin Transformer) even with slightly lower FLOPs. In addition, AS-MLP is also the first MLP-based architecture to be applied to the downstream tasks (e.g., object detection and semantic segmentation). The experimental results are also impressive. Our proposed AS-MLP obtains 51.5 mAP on the COCO validation set and 49.5 MS mIoU on the ADE20K dataset, which is competitive compared to the transformer-based architectures. Our AS-MLP establishes a strong baseline of MLP-based architecture. Code is available at https://github.com/svip-lab/AS-MLP.

研究动机与目标

动机：说明在基于 MLP 的视觉模型中利用局部特征交互的重要性，而不仅仅是全局 token 的混合。
提出一种轻量级的轴向移位机制，在纯 MLP 架构中实现局部感受野。
设计一个可扩展的 AS-MLP 主干，包含四个阶段和分层特征融合。
展示在 ImageNet-1K 上的竞争性性能，并在下游任务（COCO 检测、ADE20K 分割）上的迁移性能具有竞争力。
提供消融研究以理解移位配置、填充、空洞率和连接方式的影响。

提出的方法

引入 Axial Shifted MLP (AS-MLP) 模块，执行水平和垂直特征移位，随后进行通道投影以实现局部特征聚合。
使用 Norm 层、残差连接以及基于 MLP 的通道混合来组合移位后的特征。
移位操作在不同空间位置聚合信息而不依赖完全注意力，从而保持低复杂度。
采用四阶段的 Swin 风格主干，具有补丁分割和补丁合并以形成分层表示。
对移位尺寸、填充方法、扩张率以及串行与并行连接进行消融，以识别有效的配置。

实验结果

研究问题

RQ1在仅含 MLP 的主干中，轴向（水平和垂直）特征移位是否能实现与 CNNs 或基于窗口的变换器相当的竞争性局部感受野？
RQ2哪些移位尺寸、填充策略以及连接方式（串行 vs 并行）在保持效率的同时最大化准确性？
RQ3与基于变换器的主干相比，AS-MLP 在下游任务（如目标检测和语义分割）的迁移能力如何？
RQ4在 ImageNet-1K 上，AS-MLP 变体的模型规模、FLOPs 与精度之间的权衡是什么？
RQ5在类似资源约束下，AS-MLP 是否具备移动端友好性能，相对于 Swin Transformer？

主要发现

模型	输入	分辨率	Top-1 (%)	参数	FLOPs	吞吐量（图像/秒）
AS-MLP-T	224	224x224	81.3	28M	4.4G	1047.7
AS-MLP-S	224	224x224	83.1	50M	8.5G	619.5
AS-MLP-B	224	224x224	83.3	88M	15.2G	455.2
AS-MLP-B	384	384x384	84.3	88M	44.6G	179.2

AS-MLP 在 ImageNet-1K 上达到 83.3% 的 Top-1 准确率，参数为 88M，FLOPs 为 15.2G（AS-MLP-B，224x224）。
AS-MLP-B 在 384x384 时达到 84.3% Top-1，参数 88M，FLOPs 44.6G。
AS-MLP-S 达到 83.1% Top-1，参数 50M，FLOPs 8.5G。
AS-MLP-T 达到 81.3% Top-1，参数 28M，FLOPs 4.4G。
在移动设置中，AS-MLP（mobile）在 Top-1 上优于 Swin（mobile）（76.05% 对 75.11%）。
AS-MLP 在 COCO 目标检测（如 AS-MLP-B 51.5 APb）和 ADE20K 分割（AS-MLP-B 49.5 MS mIoU）上显示出具竞争力的结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。