[论文解读] Ensemble Adversarial Training: Attacks and Defenses
本文分析了单步对抗训练因梯度屏蔽而失败的原因,并引入集成对抗训练,通过使用来自静态预训练模型的对抗样本进行训练来提高对黑盒鲁棒性。
Adversarial examples are perturbed inputs designed to fool machine learning models. Adversarial training injects such examples into training data to increase robustness. To scale this technique to large datasets, perturbations are crafted using fast single-step methods that maximize a linear approximation of the model's loss. We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss. The model thus learns to generate weak perturbations, rather than defend against strong ones. As a result, we find that adversarial training remains vulnerable to black-box attacks, where we transfer perturbations computed on undefended models, as well as to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step. We further introduce Ensemble Adversarial Training, a technique that augments training data with perturbations transferred from other models. On ImageNet, Ensemble Adversarial Training yields models with strong robustness to black-box attacks. In particular, our most robust model won the first round of the NIPS 2017 competition on Defenses against Adversarial Attacks. However, subsequent work found that more elaborate black-box attacks could significantly enhance transferability and reduce the accuracy of our models.
研究动机与目标
- 解释为何单步对抗训练会收敛到退化极小值并且易受黑盒攻击影响。
- 提出集成对抗训练以多样化训练中看到的对抗扰动。
- 在 ImageNet 上证明鲁棒性提升,并分析攻击在不同模型之间的可迁移性。
提出的方法
- 将对抗训练形式化为带有 bounded l_infinity 扰动的问题。
- 展示单步攻击下的梯度屏蔽/退化极小值。
- 引入 R+FGSM:对单步攻击的随机扰动预步。
- 通过引入来自静态预训练模型的对抗样本来提出集成对抗训练。
- 在 ImageNet 上使用 Inception v3 和 Inception ResNet v2,对抗多种白盒和黑盒攻击进行评估。
- 讨论白盒与黑盒鲁棒性之间的收敛性与权衡。
实验结果
研究问题
- RQ1单步对抗训练是否会创建一个退化极小值,从而掩盖真实的损失景观?
- RQ2将来自静态模型的对抗扰动转移是否会提升对黑盒攻击的鲁棒性?
- RQ3集成对抗训练如何影响对大规模数据集上各种攻击类型的鲁棒性?
主要发现
- 单步对抗训练表现出梯度屏蔽,降低了在数据点附近对损失的线性近似的有效性。
- 使用单步方法的对抗训练提高了白盒鲁棒性,但由于可迁移性,降低了黑盒鲁棒性。
- 一种新的 R+FGSM 攻击(随机起始加 FGSM)在跨模型上增强了单步攻击。
- 通过使用来自静态预训练模型的扰动进行训练的集成对抗训练,提升了对 ImageNet 的黑盒攻击鲁棒性。
- 集成模型显示对抗扰动的可迁移性下降,但白盒鲁棒性可能受损。
- 最优集成模型(IRv2_adv-ens)在 NIPS 2017 防御竞赛中取得了最高表现,并在当时表现出对黑盒攻击的显著鲁棒性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。