QUICK REVIEW

[论文解读] Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

Nicholas Carlini, David Wagner|arXiv (Cornell University)|May 20, 2017

Adversarial Robustness in Machine Learning被引用 330

一句话总结

该论文综述了 ten adversarial-example detection methods，并证明它们可以被定制化攻击击败，认为对抗样本并不容易被检测到，并勾勒出评估指南。

ABSTRACT

Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy. We show that all can be defeated by constructing new loss functions. We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not. Finally, we propose several simple guidelines for evaluating future proposed defenses.

研究动机与目标

评估 ten 最近的 adversarial example detection methods 在多种 threat models 下的有效性。
确定在检测到的对抗样本在自适应、白盒和转移性攻击下是否仍然鲁棒。
了解在强评估下对抗样本与自然图像之间的所谓内在差异是否成立。
为未来防御的评估提供实用建议。

提出的方法

重新实现并再实现来自七篇论文的十个检测方案。
使用 Carlini and Wagner’s L2 targeted attack 生成对抗样本。
开发自适应、白盒攻击者损失以规避每个检测器。
利用转移性评估黑盒（知识有限）攻击。
定义一个将分类器和检测器整合以绕过防御的统一攻击框架。
在零知识、完全知识和有限知识威胁模型下评估防御。

实验结果

研究问题

RQ1在强自适应攻击下，现有的对抗样本检测方法是否能可靠地区分对抗样本与自然图像？
RQ2在攻击者对防御具有完全知识（白盒）或只有黑盒访问权限时，检测器是否仍然鲁棒？
RQ3转移性是否在有限知识场景下帮助规避检测器？
RQ4在 MNIST 上的检测结果是否可推广到更复杂的数据集如 CIFAR-10？

主要发现

所有十种检测方法都可以被针对特定防御定制的攻击击败。
在简单数据集上，失真增量有限；在 CIFAR-10 上，对抗样本与自然图像难以区分。
某些防御在零知识或简单攻击下表现良好，但在知悉防御的那类完美知识对手面前失效。
自适应攻击显著降低或消除了检测器的表观鲁棒性，有时失真仅增加约 ~10%，但仍无法得到可检测的输入。
基于误选/错误选择的防御以及依赖逐层统计或 PCA 的方法并不能真正对白盒规避保持鲁棒。
研究指出需警惕仅以 MNIST 结果作依据的风险，并呼吁标准化的评估方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。