Skip to main content
QUICK REVIEW

[论文解读] Adversarial Training and Robustness for Multiple Perturbations

Florian Tramèr, Dan Boneh|arXiv (Cornell University)|Apr 30, 2019
Adversarial Robustness in Machine Learning参考文献 39被引用 84
一句话总结

分析跨多种扰动类型的鲁棒性权衡,并提出多扰动对抗训练与新的攻击;显示对若干扰动的鲁棒性能无法达到单扰动鲁棒性的水平,在 MNIST 上观察到梯度屏蔽。

ABSTRACT

Defenses against adversarial examples, such as adversarial training, are typically tailored to a single perturbation type (e.g., small $\ell_\infty$-noise). For other perturbations, these defenses offer no guarantees and, at times, even increase the model's vulnerability. Our aim is to understand the reasons underlying this robustness trade-off, and to train models that are simultaneously robust to multiple perturbation types. We prove that a trade-off in robustness to different types of $\ell_p$-bounded and spatial perturbations must exist in a natural and simple statistical setting. We corroborate our formal analysis by demonstrating similar robustness trade-offs on MNIST and CIFAR10. Building upon new multi-perturbation adversarial training schemes, and a novel efficient attack for finding $\ell_1$-bounded adversarial examples, we show that no model trained against multiple attacks achieves robustness competitive with that of models trained on each attack individually. In particular, we uncover a pernicious gradient-masking phenomenon on MNIST, which causes adversarial training with first-order $\ell_\infty, \ell_1$ and $\ell_2$ adversaries to achieve merely $50\%$ accuracy. Our results question the viability and computational scalability of extending adversarial robustness, and adversarial training, to multiple perturbation types.

研究动机与目标

  • 理解为什么对某一种扰动类型的鲁棒性往往会降低对其他类型的鲁棒性(MEPs)
  • 开发训练方案以实现对多种扰动类型的同时鲁棒性
  • 设计高效攻击(包括对 l1 的攻击)以评估多扰动防御
  • 在 MNIST 和 CIFAR-10 上演示权衡并分析梯度屏蔽效应

提出的方法

  • 在多扰动集合 S1,...,Sn 与两种自然度量下定义对抗性风险:Avg 与 Max 对抗性风险
  • 证明 l-infinity、l1、l2 与空间扰动之间的理论权衡(MEPs)
  • 提出使用来自多种扰动类型的对抗样本的多扰动对抗训练策略(Max 与 Avg)
  • 引入 Sparse L1 Descent (SLIDE),一种适用于对抗训练的高效 l1 攻击
  • 发展并评估仿射扰动分析以理解复合扰动
  • 在 MNIST 的 CNN 与 CIFAR-10 的 Wide-ResNet 上进行经验评估

实验结果

研究问题

  • RQ1模型是否能够对多种扰动类型(例如 l-infinity、l1、l2 和空间扰动)同时保持鲁棒性?
  • RQ2自然统计模型中多扰动鲁棒性的理论极限是什么?
  • RQ3多扰动训练策略(Max/Avg)是否提升不同扰动类型的鲁棒性,代价为何?
  • RQ4扰动的仿射组合对鲁棒性的影响与扰动并集相比有何不同?
  • RQ5将当前对抗训练方法扩展到多扰动时,是否会受到梯度屏蔽的影响?

主要发现

模型Acc.ell_inftyell_1ell_21-R_adv_max1-R_adv_avg
Nat99.40.012.48.50.07.0
Adv ∞99.191.112.111.36.838.2
Adv 198.90.078.550.60.043.0
Adv 298.50.468.071.80.446.7
Adv_avg97.376.753.958.349.963.0
Adv_max97.271.762.656.052.463.4
  • 对多种扰动的鲁棒性会带来准确率损失(通常比单一扰动训练低 5-10 个点)
  • 在 MNIST 上,l1、l2 和 l-infinity 的鲁棒性可能出现梯度屏蔽,降低一阶攻击的有效性
  • 在多扰动上进行训练(Avg/Max 策略)能提升多扰动鲁棒性,但未达到最优多扰动性能(OPT),并显示出权衡
  • 扰动的仿射组合可以比任一单独扰动更强,对一个扰动的鲁棒性对抗仿射对手的鲁棒性可能不足
  • SLIDE 攻击提供了一个高效的 l1 对手,与更强的攻击相比具有竞争力,使多扰动训练具备可行性
  • 在 CIFAR-10 上,Adv_avg 与 Adv_max 提高了多扰动鲁棒性,但仍未达到最优复合扰动鲁棒性

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。