QUICK REVIEW

[论文解读] EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network

Hu Zhang, Keke Zu|arXiv (Cornell University)|May 30, 2021

Advanced Neural Network Applications参考文献 39被引用 42

一句话总结

该论文提出EPSANet，一种轻量化且高效的主干网络架构，通过用高效金字塔挤压注意力（EPSA）模块替换ResNet瓶颈模块构建而成。EPSA模块采用一种新颖的金字塔挤压注意力（PSA）机制，增强了多尺度特征表示，实现了当前最优的性能提升：在ImageNet上Top-1准确率提升+1.93%，在MS-COCO目标检测任务上提升+2.7 box AP，在实例分割任务上提升+1.7 mask AP，且无需额外的技巧或优化。

ABSTRACT

Recently, it has been demonstrated that the performance of a deep convolutional neural network can be effectively improved by embedding an attention module into it. In this work, a novel lightweight and effective attention method named Pyramid Squeeze Attention (PSA) module is proposed. By replacing the 3x3 convolution with the PSA module in the bottleneck blocks of the ResNet, a novel representational block named Efficient Pyramid Squeeze Attention (EPSA) is obtained. The EPSA block can be easily added as a plug-and-play component into a well-established backbone network, and significant improvements on model performance can be achieved. Hence, a simple and efficient backbone architecture named EPSANet is developed in this work by stacking these ResNet-style EPSA blocks. Correspondingly, a stronger multi-scale representation ability can be offered by the proposed EPSANet for various computer vision tasks including but not limited to, image classification, object detection, instance segmentation, etc. Without bells and whistles, the performance of the proposed EPSANet outperforms most of the state-of-the-art channel attention methods. As compared to the SENet-50, the Top-1 accuracy is improved by 1.93% on ImageNet dataset, a larger margin of +2.7 box AP for object detection and an improvement of +1.7 mask AP for instance segmentation by using the Mask-RCNN on MS-COCO dataset are obtained. Our source code is available at:this https URL.

研究动机与目标

开发一种更高效且有效的注意力机制，用于卷积神经网络，以改善特征表示。
解决现有通道注意力模块在捕捉多尺度空间与通道依赖关系方面的局限性。
设计一种即插即用的模块化结构，可在不进行架构重构或超参数调优的前提下增强主干网络。
在标准基准测试中实现当前最优性能，同时计算开销极低。
在包括分类、检测与实例分割在内的多种计算机视觉任务中，展示一致的性能增益。

提出的方法

提出一种金字塔挤压注意力（PSA）模块，用于替代ResNet瓶颈模块中的3×3卷积层。
引入一种多尺度特征聚合机制，通过并行金字塔池化捕捉空间与通道依赖关系。
采用可学习的注意力加权机制，动态强调不同尺度下的信息特征。
通过将PSA模块整合到残差块结构中，构建EPSA模块，同时保留残差学习特性。
通过将多个EPSA模块堆叠为ResNet风格的主干网络，构建EPSANet，支持端到端训练。
采用简洁且参数高效的架构，在保持计算效率的同时提升表征能力。

实验结果

研究问题

RQ1更高效的注意力机制是否能在不增加模型复杂度的前提下提升深度CNN的性能？
RQ2所提出的金字塔注意力机制在捕捉多尺度特征方面，与现有通道注意力模块相比表现如何？
RQ3EPSA模块在分类、检测与分割等多样化视觉任务中，能在多大程度上提升模型准确率？
RQ4EPSA模块的即插即用特性是否能在不同主干网络架构中实现一致的性能增益？
RQ5与SOTA模型如SENet-50相比，EPSANet在标准基准测试中的性能提升幅度如何？

主要发现

EPSANet在ImageNet数据集上的Top-1准确率比SENet-50高出1.93%。
在MS-COCO数据集上，使用Mask R-CNN框架时，该模型在目标检测任务中实现了+2.7 box AP的性能提升。
在实例分割任务中，使用相同Mask R-CNN框架时，实现了+1.7 mask AP的性能增益。
性能提升未依赖额外的数据增强、训练技巧或架构修改。
所提出的EPSA模块轻量化，可无缝集成到现有的ResNet风格主干网络中作为即插即用组件。
该方法在多个计算机视觉任务中均展现出一致且显著的性能提升，验证了其有效性与泛化能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。