QUICK REVIEW

[论文解读] The Space of Transferable Adversarial Examples

Florian Tramèr, Nicolas Papernot|arXiv (Cornell University)|Apr 11, 2017

Adversarial Robustness in Machine Learning参考文献 20被引用 435

一句话总结

本文估计对抗子空间的维数，并表明可转移的对抗样本在跨模型的高维、互相重叠的空间中占据，边界在多样化架构下也很接近。

ABSTRACT

Adversarial examples are maliciously perturbed inputs designed to mislead machine learning (ML) models at test-time. They often transfer: the same adversarial example fools more than one model. In this work, we propose novel methods for estimating the previously unknown dimensionality of the space of adversarial inputs. We find that adversarial examples span a contiguous subspace of large (~25) dimensionality. Adversarial subspaces with higher dimensionality are more likely to intersect. We find that for two different models, a significant fraction of their subspaces is shared, thus enabling transferability. In the first quantitative analysis of the similarity of different models' decision boundaries, we show that these boundaries are actually close in arbitrary directions, whether adversarial or benign. We conclude by formally studying the limits of transferability. We derive (1) sufficient conditions on the data distribution that imply transferability for simple model classes and (2) examples of scenarios in which transfer does not occur. These findings indicate that it may be possible to design defenses against transfer-based attacks, even for models that are vulnerable to direct attacks.

研究动机与目标

量化对抗子空间的维度及其在不同模型之间的可转移性。
评估在对抗和良性方向上，不同模型的决策边界有多么接近。
研究可转移性发生或失败的条件，并分析诸如对抗性训练等防御措施的影响。

提出的方法

引入 Gradient Aligned Adversarial Subspace (GAAS) 以发现多条正交的对抗方向。
使用一阶损失近似在给定范数界内生成并计数正交扰动。
通过在 MNIST 和 DREBIN 数据集上对源模型和目标模型进行扰动测试来衡量可转移性。
分析在合法、对抗和随机方向上的边界间距离与最小距离，以比较模型边界。
基于类别均值差异考察与模型无关的扰动，并给出可转移性的理论条件。

实验结果

研究问题

RQ1对能够欺骗神经网络及其他模型的对抗子空间的有效维数是多少？
RQ2不同模型的决策边界有多相似，尤其是在对抗方向上，这与可转移性有何关系？
RQ3在何种数据分布和模型类别下可以保证或不能保证可转移性，以及可转移性的充分条件是什么？
RQ4诸如对抗性训练等防御措施如何影响决策边界的接近程度以及黑箱攻击的可行性？

主要发现

对抗样本跨越一个连续的、多维的子空间；例如，在 MNIST 上的两个全连接网络产生一个 25 维的可转移子空间，约有 24.87 个方向可以转移到目标模型。
在所跨越的子空间内随机采样在 MNIST 的 CNN/FC 下对源模型的误分类率为 99%，对目标模型的误分类率为 89%；不同模型对的转移率会有差异（例如 MNIST 的 CNNs 的转移率为 68%）。
不同模型的决策边界在对抗和良性方向上都非常接近，表明模型类别之间的边界高度相似。
对抗性训练增加了边界之间的距离，但并不能完全阻止可转移性；转移的扰动仍然可能越过源边界并欺骗受防御的模型。
基于类别均值差异的模型无关扰动在某些对齐条件下可以转移到线性和二次模型；当这样的对齐和特征映射未被保持时，转移可能失败（XOR 偏差示例）。
本文在简单模型类别中提供了可转移性的充分条件，并给出一个转移性不成立的反例，表明并非在所有设置中都具有普遍性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。