[论文解读] Coordinate Attention for Efficient Mobile Network Design
本文提出坐标注意力,是一种轻量级的注意力模块,适用于移动网络,通过将二维池化分解为两个一维池化来嵌入位置信息,在极小开销下提升ImageNet分类和下游视觉任务的性能。
Recent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e.g., the Squeeze-and-Excitation attention) for lifting model performance, but they generally neglect the positional information, which is important for generating spatially selective attention maps. In this paper, we propose a novel attention mechanism for mobile networks by embedding positional information into channel attention, which we call "coordinate attention". Unlike channel attention that transforms a feature tensor to a single feature vector via 2D global pooling, the coordinate attention factorizes channel attention into two 1D feature encoding processes that aggregate features along the two spatial directions, respectively. In this way, long-range dependencies can be captured along one spatial direction and meanwhile precise positional information can be preserved along the other spatial direction. The resulting feature maps are then encoded separately into a pair of direction-aware and position-sensitive attention maps that can be complementarily applied to the input feature map to augment the representations of the objects of interest. Our coordinate attention is simple and can be flexibly plugged into classic mobile networks, such as MobileNetV2, MobileNeXt, and EfficientNet with nearly no computational overhead. Extensive experiments demonstrate that our coordinate attention is not only beneficial to ImageNet classification but more interestingly, behaves better in down-stream tasks, such as object detection and semantic segmentation. Code is available at https://github.com/Andrew-Qibin/CoordAttention.
研究动机与目标
- 激发在移动网络中需要保留空间(位置信息)信息的注意力机制的必要性。
- 提出一个将坐标信息嵌入其中同时保持计算量较低的新型注意力块。
- 证明坐标注意力可以以极小的开销嵌入到现有的移动骨干网络模块(如 MobileNetV2、MobileNeXt、EfficientNet)中。
- 在 ImageNet 分类以及下游任务如目标检测和语义分割方面展示改进。
提出的方法
- 将通道注意力分解为两个并行的一维特征编码过程,对水平方向和垂直方向进行池化。
- 将两个一维池化的特征连接起来,并通过共享的1x1卷积生成方向感知的注意力图(g^h 和 g^w)。
- 通过逐元素乘法将注意力图应用到输入特征图:Y_c(i,j) = X_c(i,j) * g^h_c(i) * g^w_c(j)。
- 使用降维率 r 来控制瓶颈尺寸,并在移动场景中保持计算轻量。
- 证明与反向残差块(MobileNetV2)和沙钟瓶颈结构(MobileNeXt)的即插即用兼容性,并在 ImageNet、COCO、VOC 和 Cityscapes 上进行评估。
![Figure 1: Performance of different attention methods on three classic vision tasks. The y-axis labels from left to right are top-1 accuracy, mean IoU, and AP, respectively. Clearly, our approach not only achieves the best result in ImageNet classification [ 33 ] against the SE block [ 18 ] and CBAM](https://ar5iv.labs.arxiv.org/html/2103.02907/assets/figures/illu.png)
实验结果
研究问题
- RQ1在移动网络中通过两个一维池化嵌入坐标信息,是否比 SE 和 CBAM 提升性能?
- RQ2与基线注意力方法相比,将坐标注意力嵌入到不同的移动骨干网络(如 MobileNetV2、MobileNeXt、EfficientNet)时的表现如何?
- RQ3降维率对精度和模型大小的影响如何,坐标注意力对这个超参数是否鲁棒?
- RQ4与其他轻量级注意力相比,带有坐标注意力的模型在下游任务如目标检测和语义分割上的迁移能力是否更强?
主要发现
- 坐标注意力在移动网络上的 ImageNet 分类任务中优于 SE 和 CBAM。
- 通过两个一维池化嵌入空间坐标信息,保留带位置信息的长程依赖,提升对目标对象的定位。
- 在测试的骨干网络(MobileNetV2、MobileNeXt、EfficientNet)中,坐标注意力在开销极小的前提下提供了一致的增益,并在下游任务如目标检测和语义分割上显示出显著改进。
- 消融实验表明,横向与纵向注意力的结合比单独使用任一更有效,证实坐标信息嵌入的价值。
- 可视化结果表明,与 SE 和 CBAM 相比,坐标注意力更能突出特征图中的感兴趣对象。
![Figure 2: Schematic comparison of the proposed coordinate attention block (c) to the classic SE channel attention block [ 18 ] (a) and CBAM [ 44 ] (b). Here, “GAP” and “GMP” refer to the global average pooling and global max pooling, respectively. ‘X Avg Pool’ and ’Y Avg Pool’ refer to 1D horizontal](https://ar5iv.labs.arxiv.org/html/2103.02907/assets/x1.png)
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。