QUICK REVIEW

[论文解读] The Origins and Prevalence of Texture Bias in Convolutional Neural Networks

Katherine L. Hermann, Ting Chen|arXiv (Cornell University)|Nov 20, 2019

Adversarial Robustness in Machine Learning参考文献 87被引用 120

一句话总结

这篇论文研究了为何在 ImageNet 训练的卷积神经网络偏好纹理而非形状，证明数据增强在很大程度上推动了这一偏见，并展示自然主义增广可以促进基于形状的分类并提升分布外性能。

ABSTRACT

Recent work has indicated that, unlike humans, ImageNet-trained CNNs tend to classify images by texture rather than by shape. How pervasive is this bias, and where does it come from? We find that, when trained on datasets of images with conflicting shape and texture, CNNs learn to classify by shape at least as easily as by texture. What factors, then, produce the texture bias in CNNs trained on ImageNet? Different unsupervised training objectives and different architectures have small but significant and largely independent effects on the level of texture bias. However, all objectives and architectures still lead to models that make texture-based classification decisions a majority of the time, even if shape information is decodable from their hidden representations. The effect of data augmentation is much larger. By taking less aggressive random crops at training time and applying simple, naturalistic augmentation (color distortion, noise, and blur), we train models that classify ambiguous images by shape a majority of the time, and outperform baselines on out-of-distribution test sets. Our results indicate that apparent differences in the way humans and ImageNet-trained CNNs process images may arise not primarily from differences in their internal workings, but from differences in the data that they see.

研究动机与目标

评估 ImageNet 训练的 CNN 的纹理偏差是固有于架构/训练过程，还是主要由训练数据所致。
量化不同数据增强、训练目标和架构如何影响纹理偏差。
确定在多样分布上减少纹理偏差并提高基于形状的分类的实用增广策略。
检验在决策依赖纹理时，隐藏表示中是否仍存在形状信息。

提出的方法

在模糊形状-纹理数据集（GST、Navon、ImageNet-C）上训练 CNN，以比较形状与纹理分类的性能。
评估在不同数据增强（随机裁剪与中心裁剪、颜色失真、模糊、噪声、Sobel 滤波）下训练对纹理偏差的影响。
测试各种训练目标（监督学习与自监督：Rotation、Exemplar、BigBiGAN、SimCLR）和基础架构（AlexNet、ResNet-50）。
使用 GST 刺激测量形状偏差，并评估通过线性分类器从隐藏层可解码出多少形状/纹理信息。
分析不同模型之间的 ImageNet 准确性与观察到的形状/纹理偏差之间的关系。

实验结果

研究问题

RQ1ImageNet 训练的 CNN 的纹理偏差主要是由归纳偏置引起，还是由训练数据本身引起？
RQ2数据增强、训练目标和架构如何影响纹理偏差？
RQ3自然主义增广能否减少纹理偏差并在分布外数据上改善基于形状的分类？
RQ4即使模型在分类时偏向纹理，隐藏表示中是否仍可恢复出形状信息？

主要发现

在模糊数据集上训练的 CNN 能像按纹理一样轻松地按形状进行分类，表明该偏差并非单纯的架构特性。
随机裁剪增强会增加纹理偏差，而中心裁剪和自然主义增广（颜色失真、模糊、噪声、Sobel 滤波）则降低纹理偏差。
降低纹理偏差的增广具有叠加效应，即使在没有非自然风格转移技术的情况下也能产生基于形状的模型。
自监督学习目标会影响纹理偏差，但增广通常起到更大作用；某些目标（如 Rotation）相对于监督基线减少纹理偏差。
具有更高 ImageNet 准确性的架构往往呈现较低的纹理偏差，尽管为匹配人类视觉或使用自注意力的神经网络并未在纹理偏差上始终与标准 CNN 存在差异。
在纹理偏倚模型的最终层可以解码出形状信息，在早期层中，形状的可解码性有时甚至超过纹理，提示信息在后续层中会丢失。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。