QUICK REVIEW

[论文解读] Is Second-order Information Helpful for Large-scale Visual Recognition?

Peihua Li, Jiangtao Xie|arXiv (Cornell University)|Mar 23, 2017

Advanced Neural Network Applications参考文献 25被引用 31

一句话总结

该论文提出矩阵幂归一化协方差（MPN-COV），一种用高阶卷积特征的协方差池化替代一阶池化的新型方法，以在大规模视觉识别中利用二阶统计特性。通过推导反向传播公式实现端到端训练，MPN-COV在AlexNet上实现超过4%的top-1错误率降低，并仅使用50层网络即达到与ResNet-152相当的性能，证明了深度学习中高阶特征统计量的价值。

ABSTRACT

By stacking layers of convolution and nonlinearity, convolutional networks (ConvNets) effectively learn from low-level to high-level features and discriminative representations. Since the end goal of large-scale recognition is to delineate complex boundaries of thousands of classes, adequate exploration of feature distributions is important for realizing full potentials of ConvNets. However, state-of-the-art works concentrate only on deeper or wider architecture design, while rarely exploring feature statistics higher than first-order. We take a step towards addressing this problem. Our method consists in covariance pooling, instead of the most commonly used first-order pooling, of high-level convolutional features. The main challenges involved are robust covariance estimation given a small sample of large-dimensional features and usage of the manifold structure of covariance matrices. To address these challenges, we present a Matrix Power Normalized Covariance (MPN-COV) method. We develop forward and backward propagation formulas regarding the nonlinear matrix functions such that MPN-COV can be trained end-to-end. In addition, we analyze both qualitatively and quantitatively its advantage over the well-known Log-Euclidean metric. On the ImageNet 2012 validation set, by combining MPN-COV we achieve over 4%, 3% and 2.5% gains for AlexNet, VGG-M and VGG-16, respectively; integration of MPN-COV into 50-layer ResNet outperforms ResNet-101 and is comparable to ResNet-152. The source code will be available on the project page: http://www.peihuali.org/MPN-COV

研究动机与目标

探究深度特征中的二阶统计特性是否能在一阶池化之外显著提升大规模视觉识别性能。
解决在高维特征样本量较少时的稳健协方差估计挑战。
开发一种可微分、端到端可训练的方法，捕捉协方差矩阵流形结构，而无需依赖对数-欧几里得度量。
证明引入二阶统计特性可显著提升多种深度神经网络架构的识别准确率。

提出的方法

提出矩阵幂归一化协方差（MPN-COV）作为一阶池化的可微分替代方法，用高阶特征的协方差池化替代全局平均池化。
引入矩阵幂归一化技术，以在小样本和高维条件下稳定协方差估计。
利用矩阵微积分推导MPN-COV中非线性矩阵函数的前向与反向传播规则，实现深度网络中的端到端训练。
隐式利用协方差矩阵流形的几何结构，避免对数-欧几里得度量带来的计算与数值缺陷。
将MPN-COV作为最终卷积层后的全局池化层，后续接全连接层用于分类。
采用$1\times1$卷积降低通道维度，以实现高效计算并提升特征表示能力。

实验结果

研究问题

RQ1深度特征中的二阶统计特性是否能显著提升大规模视觉识别任务的性能？
RQ2当仅有少量高维特征时，稳健的协方差估计是否可行？
RQ3能否在不引入不稳定性或高计算成本的前提下，在深度学习中有效利用协方差矩阵流形的几何结构？
RQ4MPN-COV是否在大规模设置下优于一阶池化及现有二阶方法（如DeepO2P和B-CNN）？
RQ5MPN-COV能否使浅层网络达到或超越ResNet-101和ResNet-152等深层模型的性能？

主要发现

在ImageNet 2012验证集上，MPN-COV在AlexNet中将top-1错误率较一阶池化降低4.1%，达到34.60%（对比37.07%）。
在VGG-M中，MPN-COV将top-1错误率从一阶池化的29.62%降至26.55%，在不同初始化方案下分别从37.07%降至34.60%。
在VGG-16中，MPN-COV实现34.68%的top-1错误率（10裁剪），优于原始VGG-16（27.41%），并达到或超越GoogleNet和PReLU-net B的性能。
在ResNet-50中引入MPN-COV后，top-1错误率从24.95%降至22.73%（1裁剪），从22.85%降至21.20%（10裁剪），优于ResNet-101，且与ResNet-152性能相当。
MPN-COV网络在训练过程中收敛更快，第60轮时top-1错误率已降至18.02%，而基线ResNet-50为25.98%。
MPN-COV使50层ResNet的性能达到与152层ResNet相当的水平，证明二阶统计特性可有效补偿网络深度的不足。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。