QUICK REVIEW

[论文解读] Global Second-order Pooling Convolutional Networks

Zilin Gao, Jiangtao Xie|arXiv (Cornell University)|Nov 29, 2018

Advanced Neural Network Applications参考文献 28被引用 29

一句话总结

本文提出全局二阶池化卷积网络（GSoP-Net），通过在深层卷积网络的多个层级中整合全局二阶池化（GSoP）模块，捕捉整体二阶统计特征，以增强非线性表征学习能力。通过在中间层对特征图应用GSoP，并利用学习到的协方差矩阵进行通道维特征重校准，该模型在ImageNet-1K和CIFAR-100上实现了最先进性能，且计算开销极低。

ABSTRACT

Deep Convolutional Networks (ConvNets) are fundamental to, besides large-scale visual recognition, a lot of vision tasks. As the primary goal of the ConvNets is to characterize complex boundaries of thousands of classes in a high-dimensional space, it is critical to learn higher-order representations for enhancing non-linear modeling capability. Recently, Global Second-order Pooling (GSoP), plugged at the end of networks, has attracted increasing attentions, achieving much better performance than classical, first-order networks in a variety of vision tasks. However, how to effectively introduce higher-order representation in earlier layers for improving non-linear capability of ConvNets is still an open problem. In this paper, we propose a novel network model introducing GSoP across from lower to higher layers for exploiting holistic image information throughout a network. Given an input 3D tensor outputted by some previous convolutional layer, we perform GSoP to obtain a covariance matrix which, after nonlinear transformation, is used for tensor scaling along channel dimension. Similarly, we can perform GSoP along spatial dimension for tensor scaling as well. In this way, we can make full use of the second-order statistics of the holistic image throughout a network. The proposed networks are thoroughly evaluated on large-scale ImageNet-1K, and experiments have shown that they outperformed non-trivially the counterparts while achieving state-of-the-art results.

研究动机与目标

为解决现有深层卷积神经网络仅在网络末端利用二阶统计特征的局限性，通过将高阶建模扩展至更早层。
通过在全局二阶池化中捕捉长程统计依赖关系，增强深层网络的非线性建模能力。
设计一种模块化、高效的GSoP模块，可轻松嵌入ResNet、Inception和DenseNet等现有架构中。
通过实证验证，早期集成二阶统计特征可生成比一阶方法（如SE-Net和CBAM）更具判别性的表征。

提出的方法

GSoP模块接收卷积层输出的3D特征张量，通过在空间维和通道维上应用全局二阶池化，计算协方差矩阵。
所得协方差矩阵通过1×1卷积和非线性激活（ReLU）进行嵌入，生成通道维注意力图。
该注意力图用于沿通道维缩放原始特征张量，实现基于二阶统计的特征重校准。
该方法支持在空间维和通道维上同时进行GSoP，允许在网络中多个阶段灵活集成。
在基于ResNet的架构中，每个残差阶段仅插入一次GSoP模块，最大限度减少参数量和FLOP增加。
网络采用标准优化方法进行端到端训练，并通过消融实验分析模块位置与超参数敏感性。

实验结果

研究问题

RQ1在深层卷积神经网络的中间层集成全局二阶池化，是否能超越网络末端池化，提升表征学习能力？
RQ2在早期和中级层使用二阶统计特征，与一阶统计特征（如全局平均池化）相比，其判别能力如何？
RQ3在不同网络深度插入GSoP模块，对最终准确率和特征质量有何影响？
RQ4所提出的GSoP模块与现有注意力机制（如SE-Net和CBAM）相比，在捕捉长程上下文依赖关系方面表现如何？

主要发现

在ImageNet-1K上，GSoP-Net2的top-1错误率为20.94%，显著优于SE-Net（21.31%）和CBAM。
GSoP-Net2在top-1错误率上比使用网络末端协方差池化的强基线iSQRT-COV低1.36个百分点。
在CIFAR-100上，GSoP-Net2将错误率降低至18.58%，相比原始ResNet-164基线提升5.75%，并优于iSQRT-COV 1.37个百分点。
消融实验确认，早期阶段插入GSoP模块可显著提升性能，仅需4个模块即可获得显著增益。
所提出的GSoP模块高度模块化，计算开销极低，GSoP-Net2在ImageNet上的参数量仅增加360万，FLOPs增加0.58 GFLOPs。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。