QUICK REVIEW

[论文解读] Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

Anish Athalye, Nicholas Carlini|arXiv (Cornell University)|Feb 1, 2018

Adversarial Robustness in Machine Learning参考文献 34被引用 1,165

一句话总结

论文定义混淆梯度，分类三种类型，并提出攻击技术以绕过依赖梯度遮蔽的防御，在 ICLR 2018 非认证防御上进行评估。

ABSTRACT

We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this effect can be circumvented. We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. In a case study, examining non-certified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely, and 1 partially, in the original threat model each paper considers.

研究动机与目标

识别混淆梯度是防御对抗样本鲁棒性假象的原因之一。
表征三种混淆梯度类型并开发攻击以克服它们。
对一组 ICLR 2018 防御进行实证评估，以衡量普遍性和绕过成功率。
提供可重复的基线和攻击实现，以对防御进行稳健评估。

提出的方法

定义混淆梯度及三种分类类型：碎片化梯度、随机梯度、以及消失/爆炸梯度。
开发 Backward Pass Differentiable Approximation (BPDA) 以近似通过非可微分或非有效可微分层的梯度。
使用 Expectation Over Transformation (EOT) 计算通过随机化防御的梯度。
应用重参数化以避免梯度的爆炸/消失。
通过 EOT 与 BPDA 估计梯度来攻击随机化防御。
复现实验防御和攻击，以评估可重复性与评估陷阱。

实验结果

研究问题

RQ1防御是否普遍依赖混淆梯度以在迭代攻击上看起来鲁棒？
RQ2在原始威胁模型下，新的攻击技术（BPDA、EOT、重参数化）能否绕过这些防御？
RQ3在当代防御中，诸如 ICLR 2018 的防御，混淆梯度的普遍性如何？
RQ4研究人员应采用哪些最佳实践，以诚实且可重复地评估对抗鲁棒性？

主要发现

Defense	Dataset	Distance	Accuracy
Buckman et al. (2018)	CIFAR	0.031 ( ∞ )	0%*
Ma et al. (2018)	CIFAR	0.031 ( ∞ )	5%
Guo et al. (2018)	ImageNet	0.005 ( 2 )	0%*
Dhillon et al. (2018)	CIFAR	0.031 ( ∞ )	0%
Xie et al. (2018)	ImageNet	0.031 ( ∞ )	0%*
Song et al. (2018)	CIFAR	0.031 ( ∞ )	9%*
Samangouei et al. (2018)	MNIST	0.005 ( 2 )	55%**
Madry et al. (2018)	CIFAR	0.031 ( ∞ )	47%
Na et al. (2018)	CIFAR	0.015 ( ∞ )	15%

混淆梯度很常见：9 项中有 7 项的 ICLR 2018 防御依赖梯度遮蔽。
所提出的攻击在它们的威胁模型下完全绕过 6 项防御，部分绕过 1 项。
BPDA、EOT 和重参数化能够有效地产生对非可微分、随机化或深度展开防御的对抗样本。
对抗性训练在某些混淆梯度防御下仍然脆弱，且许多评估缺乏现实威胁模型。
作者提供了可重复实现的防御与攻击，以促进可靠评估。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。