QUICK REVIEW

[论文解读] Intriguing generalization and simplicity of adversarially trained neural networks

Chirag Agarwal, Peijie Chen|arXiv (Cornell University)|Jun 16, 2020

Adversarial Robustness in Machine Learning被引用 3

一句话总结

本文研究了对抗性训练的神经网络在分布外数据上的泛化能力及其所学习的表征。研究发现，鲁棒模型在无纹理图像（如轮廓图和风格化图像）上的泛化能力更强，主要依赖形状线索；而对抗性训练引发了三个关键转变：特征检测更平滑、对低层次纹理/颜色的关注度提高，以及神经元复杂度降低。

ABSTRACT

Adversarial training has been the topic of dozens of studies and a leading method for defending against adversarial attacks. Yet, it remains unknown (a) how adversarially-trained classifiers (a.k.a classifiers) generalize to new types of out-of-distribution examples; and (b) what hidden representations were learned by robust networks. In this paper, we perform a thorough, systematic study to answer these two questions on AlexNet, GoogLeNet, and ResNet-50 trained on ImageNet. While robust models often perform on-par or worse than standard models on unseen distorted, texture-preserving images (e.g. blurred), they are consistently more accurate on texture-less images (i.e. silhouettes and stylized). That is, robust models rely heavily on shapes, in stark contrast to the strong texture bias in standard ImageNet classifiers (Geirhos et al. 2018). Remarkably, adversarial training causes three significant shifts in the functions of hidden neurons. That is, each convolutional neuron often changes to (1) detect pixel-wise smoother patterns; (2) detect more lower-level features i.e. textures and colors (instead of objects); and (3) be simpler in terms of complexity i.e. detecting more limited sets of concepts.

研究动机与目标

理解对抗性训练的分类器在标准ImageNet数据之外的分布外样本上的泛化能力。
研究鲁棒模型所学习的隐层表征与标准模型相比的性质。
识别对抗性训练后神经元在结构和功能上的变化。
确定鲁棒性是否与特征层次和表征复杂度的转变相关。

提出的方法

使用PGD攻击对AlexNet、GoogLeNet和ResNet-50在ImageNet上进行对抗性训练。
在包括模糊化、风格化和轮廓图在内的多样化分布外数据集上评估泛化性能。
通过分析隐藏层激活，评估特征复杂度、空间平滑度以及每个神经元可检测的概念。
在多个架构和层上比较标准模型与对抗性训练模型中神经元的行为。
通过测量激活的空间平滑度、特征特异性以及概念多样性，量化神经元功能的转变。

实验结果

研究问题

RQ1对抗性训练的模型在纹理或失真被改变的分布外图像上如何泛化？
RQ2与标准模型相比，鲁棒模型学习到何种归纳偏差？
RQ3对抗性训练如何改变单个卷积神经元的功能行为？
RQ4鲁棒模型在多大程度上依赖形状而非纹理进行分类？

主要发现

对抗性训练的模型在无纹理图像（如轮廓图和风格化图像）上优于标准模型，表明其具有更强的形状偏向。
标准ImageNet分类器表现出强烈的纹理偏向，而鲁棒模型则转向检测基于形状的特征。
鲁棒模型中每个卷积神经元的激活模式在空间上更加平滑，表明其检测的是更粗粒度、更平滑的模式。
鲁棒模型中的神经元越来越多地检测低层次特征（如颜色和纹理），而非高层次物体，表明表征层次发生了转变。
对抗性训练网络中的神经元复杂度降低，检测的视觉概念更少且更受限制。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。