QUICK REVIEW

[论文解读] Why do deep convolutional networks generalize so poorly to small image transformations?

Aharon Azulay, Yair Weiss|arXiv (Cornell University)|May 30, 2018

Domain Adaptation and Few-Shot Learning被引用 299

一句话总结

该论文量化了现代 CNN 对微小图像变换（如一个像素的平移或缩放）的脆弱性，并分析了卷积结构和数据增强为何不能保证不变性，提出了部分解决方案，如抗混叠以及更多的增强。

ABSTRACT

Convolutional Neural Networks (CNNs) are commonly assumed to be invariant to small image transformations: either because of the convolutional architecture or because they were trained using data augmentation. Recently, several authors have shown that this is not the case: small translations or rescalings of the input image can drastically change the network's prediction. In this paper, we quantify this phenomena and ask why neither the convolutional architecture nor data augmentation are sufficient to achieve the desired invariance. Specifically, we show that the convolutional architecture does not give invariance since architectures ignore the classical sampling theorem, and data augmentation does not give invariance because the CNNs learn to be invariant to transformations only for images that are very similar to typical images from the training set. We discuss two possible solutions to this problem: (1) antialiasing the intermediate representations and (2) increasing data augmentation and show that they provide only a partial solution at best. Taken together, our results indicate that the problem of insuring invariance to small image transformations in neural networks while preserving high accuracy remains unsolved.

研究动机与目标

量化现代 CNN 对小幅图像变形的不变性缺失。
研究架构选择（卷积、子采样）与数据增强如何促成脆弱性。
解释当前 CNN 设计或训练实践为何不能保证移位不变性。
评估提出的对策，如对内部表示的抗混叠和增加数据增强，并评估其有效性。

提出的方法

在 1000 张 ImageNet 验证图像上测试了四种扰动协议，这些协议产生一个像素级差异（裁剪、黑色背景嵌入、用修补嵌入、尺寸变化嵌入）。
测量两个不变量：（i）P(Top-1 变化) 和（ii）前一类概率的平均绝对变化（MAC）。
比较六种预训练 CNN（来自 Keras 的三种：VGG16、ResNet50、InceptionResNetV2；来自 PyTorch 的三种：VGG16、ResNet50、DenseNet121）。
通过在中间层训练读出分类器并评估一个像素平移效果，分析层深度对平移性的影响。
就采样、移位性，以及在 CNN 子采样和非线性中的香农-奈奎斯特定理进行了理论性讨论。
评估提出的解决方案：内部表征的抗混叠以及扩展的数据增强。

实验结果

研究问题

RQ1小幅图像变换在多大程度上会改变 CNN 的预测，且在不同架构和扰动协议下的差异如何？
RQ2为何卷积结构和数据增强在小幅平移或重缩放下不能保证不变性？
RQ3子采样（步幅）与采样定理在产生或破坏 CNN 的移位不变性中扮演何种角色？
RQ4抗混叠和增加数据增强是否能显著改善不变性，程度如何？
RQ5在训练数据中的典型性如何影响 CNN 对小幅变换的易感度？

主要发现

一个像素的扰动可以使 CNN 的 Top-1 预测概率改变，概率高达约 30%。
不变性的缺乏在多种架构及 Keras 与 PyTorch 的预训练模型中都可以观察到。
平移不变性并非由来因为子采样（步幅）破坏了字面上的平移可移性；要实现全局池化的移位不变性，需要具有可移位的特征图并且按香农-奈奎斯特准则进行适当采样。
CNN 只有在与训练数据非常相似的图像上才学会不变性；当图像偏离摄影师的偏好时，对小变换的敏感性增加。
内部表示的抗混叠仅带来部分改进；增加数据增强有帮助但并不能完全解决问题，尤其对非典型图像。
更深的层由于累积的子采样和非线性，变得更不易移位，导致深度增加时对小平移的易感性提高。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。