[论文解读] Motivating the Rules of the Game for Adversarial Example Research
本文提出了一种对抗样本的攻击者/防守者规则分类法,以使研究与现实世界的安全威胁保持一致,并批评当前对扰动-防御实践过分依赖简单、抽象威胁模型。
Advances in machine learning have led to broad deployment of systems with impressive performance on important problems. Nonetheless, these systems can be induced to make errors on data that are surprisingly similar to examples the learned system handles correctly. The existence of these errors raises a variety of questions about out-of-sample generalization and whether bad actors might use such examples to abuse deployed systems. As a result of these security concerns, there has been a flurry of recent papers proposing algorithms to defend against such malicious perturbations of correctly handled examples. It is unclear how such misclassifications represent a different kind of security problem than other errors, or even other attacker-produced examples that have no specific relationship to an uncorrupted input. In this paper, we argue that adversarial example defense papers have, to date, mostly considered abstract, toy games that do not relate to any specific security concern. Furthermore, defense papers have not yet precisely described all the abilities and limitations of attackers that would be relevant in practical security. Towards this end, we establish a taxonomy of motivations, constraints, and abilities for more plausible adversaries. Finally, we provide a series of recommendations outlining a path forward for future work to more clearly articulate the threat model and perform more meaningful evaluation.
研究动机与目标
- 澄清在对抗样本研究中构成有意义的安全威胁的标准。
- 引入与现实场景相统一的攻击者动机、约束和能力分类法。
- 评估扰动防御文献如何映射到现实安全问题并识别差距。
- 就威胁建模与评估提供建议,以提升相关性和严格性。
提出的方法
- 开发一个双人攻击者–防守者博弈框架来定义对抗样本。
- 按轴对攻击者能力进行分类:目标(有目标与无目标)、知识(白盒/黑盒)、行动空间(包括不可区分、保持内容的、无可疑的、受内容约束的和无约束输入)。
- 区分起点考虑(从数据中抽取 vs 固定输入)以及博弈序列(谁先行动、可重复性)。
- 调查扰动-防御文献中常用的规则(例如来自数据点的 l_p-有界扰动),并批评其现实性。
- 讨论评估指标(对抗鲁棒性作为数据之上的期望)及硬度反转等问题,以及鲁棒化的 NP-hard 性。
- 提供具体的现实世界示例场景以推动规则选择和安全相关性。
![Figure 1 : An example of image spam shown in [ 77 ] . Note the notion of a “starting point” does not apply here, instead the entire image is crafted from scratch by the attacker to avoid statistical detection. It is not of the form of applying a small or imperceptible perturbation to random image fr](https://ar5iv.labs.arxiv.org/html/1807.06732/assets/image_spam2.png)
实验结果
研究问题
- RQ1在部署的 ML 系统中,哪些攻击者目标和成功标准才是现实可行的?
- RQ2哪些攻击者知识和行动空间是可信的,它们如何限制防御设计?
- RQ3扰动-防御规则与真实安全威胁的映射有多好,它们在哪些方面存在分歧?
- RQ4哪些评估做法能带来有意义的安全洞察,而不是人工制品的产物?
主要发现
- 许多扰动-防御研究假设起点来自数据分布且扰动被 l_p 范数有界,这与现实的安全威胁往往不一致。
- 文献经常报告对特定攻击者策略的鲁棒性,导致硬度反转,即更强的攻击者似乎面临更容易的防御。
- 需要明确的威胁模型和更广泛的攻击者能力分类体系以避免不确定或被高估的安全主张。
- 被单一鲁棒性分数主导的评估指标可能具有误导性,因为存在 NP-hard 优化和不可控的攻击策略。
- 现实世界的攻击场景(保持内容、无可疑、载荷受限、无约束)揭示了标准扰动-防御框架中的缺口。
![Figure 2 : Images equally far away from a reference image in the $l_{2}$ sense can be dramatically different in perceived distance. Figure due to [ 97 ] .](https://ar5iv.labs.arxiv.org/html/1807.06732/assets/x1.png)
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。