QUICK REVIEW

[论文解读] Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba|arXiv (Cornell University)|Dec 21, 2013

Neural Networks and Applications参考文献 6被引用 5,706

一句话总结

简述：论文表明（1）深度网络中的语义意义分布在激活空间中，而非特定单元；（2）神经网络易受对抗性样本的影响——对不可察觉的输入扰动导致错误分类，且扰动常在模型和训练集之间转移。

ABSTRACT

Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks. Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

研究动机与目标

质疑在深度网络中单个高级单元具有独特语义角色的观念。
证明随机激活方向也可能在语义上与学习得到的方向相似。
表明通过小而精心设计的输入扰动能够可靠地改变网络预测（对抗性样本）。
研究对抗性样本在跨模型和跨训练集上的泛化。
提出一个框架，将对抗性扰动与局部空间几何和训练时的难负样本联系起来。

提出的方法

通过在 φ(x) 的自然基坐标与随机方向上比较激活，分析语义含义。
通过求解边界约束下的优化来最小化扰动范数，以实现目标错分类，形式化定义并计算对抗扰动。
使用带边界约束的 L-BFGS 及线搜索来近似 D(x,l) 以找到最小扰动。
在 MNIST、AlexNet 和 QuocNet 架构以及不同训练集上评估对抗性样本。
对各层的谱分析以界定输入-输出稳定性。
评估对抗性样本在跨模型和跨训练集上的转移性。

实验结果

研究问题

RQ1神经网络中的语义信息是否在单个单元中，还是在整个激活空间中？
RQ2激活空间中的随机方向是否能产生与单个单元最大化相似的语义可视化？
RQ3深度网络是否易受到不可察觉输入扰动产生的对抗性样本，以及这些扰动是否可在不同模型和训练数据之间转移？
RQ4激活映射的局部几何如何与网络稳定性与泛化性相关？
RQ5是否可以通过在训练阶段进行难负样本挖掘或对抗性训练来利用对抗性样本提升泛化？

主要发现

模型名称	描述	训练误差	测试误差	平均最小失真度
FC10(10^{-4})	带 λ=10^{-4} 的 Softmax	6.7%	7.4%	0.062
FC10(10^{-2})	带 λ=10^{-2} 的 Softmax	10%	9.4%	0.1
FC10(1)	带 λ=1 的 Softmax	21.2%	20%	0.14
FC100-100-10	Sigmoid 网络 λ=10^{-5},10^{-5},10^{-6}	0%	1.64%	0.058
FC200-200-10	Sigmoid 网络 λ=10^{-5},10^{-5},10^{-6}	0%	1.54%	0.065
AE400-10	带 Softmax λ=10^{-6} 的自编码器	0.57%	1.9%	0.086

activating 空间中的随机方向可以产生与最大化单个单元激活相似的语义相关图像。
语义信息分布在激活空间中，而非仅限于高层的单个单元。
对抗性样本存在于多种架构（MNIST、AlexNet、QuocNet），且在视觉上几乎不可区分的同时会导致错误分类。
对抗性样本可跨越具有不同超参数的模型转移，甚至在使用不同子集数据训练时也成立。
将对抗性样本纳入训练时，某些 MNIST 模型的泛化能力可以得到提升。
谱分析显示分层的 Lipschitz 边界可以约束不稳定性，提示通过正则化降低对抗易感性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。