QUICK REVIEW

[论文解读] Adversarial Spheres

Justin Gilmer, Luke Metz|arXiv (Cornell University)|Jan 9, 2018

Adversarial Robustness in Machine Learning参考文献 8被引用 47

一句话总结

本文研究高维同心球数据集中的对抗样本，证明了最近误差距离的平均值与测试误差之间的界限，并显示多种模型接近该界限，意味着鲁棒性取决于降低泛化误差。

ABSTRACT

State of the art computer vision models have been shown to be vulnerable to small adversarial perturbations of the input. In other words, most images in the data distribution are both correctly classified by the model and are very close to a visually similar misclassified image. Despite substantial research interest, the cause of the phenomenon is still poorly understood and remains unsolved. We hypothesize that this counter intuitive behavior is a naturally occurring result of the high dimensional geometry of the data manifold. As a first step towards exploring this hypothesis, we study a simple synthetic dataset of classifying between two concentric high dimensional spheres. For this dataset we show a fundamental tradeoff between the amount of test error and the average distance to nearest error. In particular, we prove that any model which misclassifies a small constant fraction of a sphere will be vulnerable to adversarial perturbations of size $O(1/\sqrt{d})$. Surprisingly, when we train several different architectures on this dataset, all of their error sets naturally approach this theoretical bound. As a result of the theory, the vulnerability of neural networks to small adversarial perturbations is a logical consequence of the amount of test error observed. We hope that our theoretical analysis of this very simple case will point the way forward to explore how the geometry of complex real-world data sets leads to adversarial examples.

研究动机与目标

通过一个简单、界定清楚的高维数据集来激发和理解对抗样本。
定义并关联误差集的两个基本度量：误差率 mu(E) 和最近误差的平均距离 d(E)。
证明在高维中，较小的分类误差意味着大多数数据点与误差的接近性。
给出独立于模型架构的对抗鲁棒性的理论界限。
展示实际神经网络在这个合成任务上与理论界限的一致性。

提出的方法

研究一个在 n 维的两球数据分布（同心球，半径为1和 R=1.3）。
在该数据集上训练各种神经网络架构，并使用数据流形对抗攻击（在||x||2 固定约束下的流形 PGD）同时评估测试误差与与误差的最近距离。
给出并分析一个解析上可处理的二次网络，其判决边界为椭球，以导出对抗样本存在与否的条件。
证明一个界限：d(E) <= O(Phi^{-1}(p)/sqrt(n))，其中 p 是内球上的准确度，E 是在内球上的错分点。
表示不同架构的神经网络在不同训练集大小 N 下接近实际观察到的界限。
使用基于中心极限定理的估计将二次网络中的 alpha_i 参数与估计的误差率相关联。

实验结果

研究问题

RQ1在高维中，数据流形上的误差率 mu(E) 与最近误差的平均距离 d(E) 之间的关系是什么？
RQ2一个简单的高维合成任务能否揭示独立于模型架构的对抗鲁棒性的基本界限？
RQ3不同的神经网络架构是否表现出符合理论界限的判决边界，给定 mu(E) 时的 d(E)？
RQ4是否可能在不降低该数据集的测试误差的情况下提高对抗鲁棒性？

主要发现

存在能够正确分类大多数随机选择点的模型，但在数据流形上附近存在错误分类（对抗样本）。
对于这个数据集，任何将内球的小常数分数错分的模型都将具有大小为 O(1/√n) 的对抗扰动。
在该数据集上训练的各种架构的神经网络接近连接 mu(E) 与 d(E) 的理论界限。
一个具有对偶解析形式的二次网络表明，参数对齐不完美（alpha_i 落在 [1/R^2,1] 之外）会产生对抗样本，尽管经验测试误差很小。
对 d(E) 的界限可以估计，并且相对于观察到的 mu(E) 是紧的；提高鲁棒性需要显著降低 mu(E)。
mu(E) 与 d(E) 之间的观察关系在不同架构之间相似，表明这是几何驱动的界限，而非特定架构现象。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。