QUICK REVIEW

[论文解读] Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

Yunpeng Chen, Haoqi Fan|arXiv (Cornell University)|Apr 10, 2019

Advanced Neural Network Applications参考文献 50被引用 149

一句话总结

本文提出 Octave Convolution (OctConv)，一种即插即用的操作，将特征图分解为高频与低频八度，以降低空间冗余，在图像和视频任务上降低内存/计算同时提升准确性。

ABSTRACT

In natural images, information is conveyed at different frequencies where higher frequencies are usually encoded with fine details and lower frequencies are usually encoded with global structures. Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies. In this work, we propose to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially "slower" at a lower spatial resolution reducing both memory and computation cost. Unlike existing multi-scale methods, OctConv is formulated as a single, generic, plug-and-play convolutional unit that can be used as a direct replacement of (vanilla) convolutions without any adjustments in the network architecture. It is also orthogonal and complementary to methods that suggest better topologies or reduce channel-wise redundancy like group or depth-wise convolutions. We experimentally show that by simply replacing convolutions with OctConv, we can consistently boost accuracy for both image and video recognition tasks, while reducing memory and computational cost. An OctConv-equipped ResNet-152 can achieve 82.9% top-1 classification accuracy on ImageNet with merely 22.2 GFLOPs.

研究动机与目标

动机并建模：自然图像在多种空间频率上包含信息，需要分别处理。
提出一个通用的、即插即用的 Octave Convolution 单元，它在不改变架构的前提下替代普通卷积。
表明 OctConv 在 ImageNet 和 Kinetics 上的二维和三维骨干网络中，同时降低内存与 FLOPs 的开销并提升准确性。
演示与分组卷积和深度卷积的兼容性，并分析感受野优势与对齐注意事项。

提出的方法

通过将输入通道分为高频 XH 和低频 XL 两组来定义八度特征表示，XL 的空间分辨率是 XH 的一半（一个八度）。
通过将卷积核分解为同频内分量和跨频分量来设计 Octave Convolution，以四条计算路径更新 YH 和 YL。
Compute YH = f(XH; WHH) + upsample(f(XL; LHH), 2) and YL = f(XL; LLL) + pool(f(XH; HLH), 2), enabling inter-frequency information exchange.
实现细节包括使用平均池化进行下采样，以避免错位并保持效率。
提供适用于分组卷积和深度卷积的变体，以便在不进行大规模重新设计的情况下将 OctConv 集成到现有架构中。

实验结果

研究问题

RQ1替换普通卷积为 OctConv 是否会提升图像和视频识别任务的准确性？
RQ2在不同骨干网络中使用 OctConv 时的 FLOPs/内存权衡如何？
RQ3低频通道比例 α 如何影响性能和效率？
RQ4OctConv 是否与分组卷积和深度卷积及其他面向高效的 CNN 设计兼容？
RQ5OctConv 对感受野和频率组之间信息交换有何影响？

主要发现

含有 OctConv 的网络在 ImageNet 和 Kinetics 上对多种骨干网络持续提升准确性，同时降低 FLOPs。
使用 OctConv 的 FLOPs-准确性权衡是凹的，α 取值在约 0.125–0.25 附近的“甜点”点可提供显著增益。
OctConv 实际加速接近理论 FLOP 降低，例如 ResNet-50 在 CPU 上显示出显著的加速。
低频图具有加倍的有效感受野，能在不增加额外内存的情况下实现更好的上下文理解。
内部与跨频交换路径对最大化性能都很重要，OctConv 对浅层网络的收益更大，因为感受野提升更明显。
与 MG-Conv 及相关多尺度方法相比，OctConv 在 FLOPs-准确性方面表现更好，同时内存和计算量更低。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。