QUICK REVIEW

[论文解读] On Adaptive Attacks to Adversarial Example Defenses

Florian Tramèr, Nicholas Carlini|arXiv (Cornell University)|Feb 19, 2020

Adversarial Robustness in Machine Learning参考文献 58被引用 141

一句话总结

本文表明 thirteen 最近的对抗性防御可以被通过精心调整的自适应攻击绕过，并提供详细的方法来执行此类评估。

ABSTRACT

Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS---and chosen for illustrative and pedagogical purposes---can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result---showing that a defense was ineffective---this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.

研究动机与目标

证明现有自适应评估往往不足以证明防御的鲁棒性。
开发一个可复制、逐步的用于针对防御构造强自适应攻击的方法论。
突出防御评估中的常见弱点，并提供更强健测试的指导。

提出的方法

对来自 ICLR、ICML、及 NeurIPS 的一系列防御进行调查与特征描述。
利用易于优化且一致的损失函数构造改进的、针对特定防御的自适应攻击。
使用标准攻击工具（PGD, C&W, BPDA, EOT）并将其针对每种防御进行定制。
通过阅读原论文和代码，迭代性地提出失败模式假设，然后实现更强的自适应攻击。
记录完整的攻击开发过程，以作为未来评估的教程。

实验结果

研究问题

RQ1现有针对对抗性样本的防御是否能够承受针对其具体机制精心调试的自适应攻击？
RQ2在面向多样化防御策略的成功自适应攻击中，反复出现的主题是什么？
RQ3应如何构建自适应攻击方法学，以避免对单一、可能被利用的技术的依赖？
RQ4当前的防御评估是否依赖无法代理攻击成功的损失函数或优化方法，如何纠正？

主要发现

自适应攻击可以显著降低对十三种防御所声称的鲁棒性。
简单、调优良好的自适应攻击往往优于更复杂或间接的策略。
基于分数的、基于决策的以及迁移攻击在梯度屏蔽导致的梯度方法失败时仍可能成功。
在更强的自适应评估下，许多防御的鲁棒性声明并不成立，尽管作者并未声称所有方法在所有设置下都无效。
攻击策略并非完全自动化，需要仔细的、针对防御的调优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。