Skip to main content
QUICK REVIEW

[论文解读] Adversarial Weight Perturbation Helps Robust Generalization

Dongxian Wu, Shu‐Tao Xia|arXiv (Cornell University)|Apr 13, 2020
Adversarial Robustness in Machine Learning参考文献 74被引用 206
一句话总结

本论文提出对抗性权重扰动(AWP),一种正则化项,通过对模型权重进行对抗扰动来平滑权重损失景观并在对抗训练中提升鲁棒性。

ABSTRACT

The study on improving the robustness of deep neural networks against adversarial examples grows rapidly in recent years. Among them, adversarial training is the most promising one, which flattens the input loss landscape (loss change with respect to input) via training on adversarially perturbed examples. However, how the widely used weight loss landscape (loss change with respect to weight) performs in adversarial training is rarely explored. In this paper, we investigate the weight loss landscape from a new perspective, and identify a clear correlation between the flatness of weight loss landscape and robust generalization gap. Several well-recognized adversarial training improvements, such as early stopping, designing new objective functions, or leveraging unlabeled data, all implicitly flatten the weight loss landscape. Based on these observations, we propose a simple yet effective Adversarial Weight Perturbation (AWP) to explicitly regularize the flatness of weight loss landscape, forming a double-perturbation mechanism in the adversarial training framework that adversarially perturbs both inputs and weights. Extensive experiments demonstrate that AWP indeed brings flatter weight loss landscape and can be easily incorporated into various existing adversarial training methods to further boost their adversarial robustness.

研究动机与目标

  • 在对抗训练中激发对输入损失平滑之外的鲁棒泛化。
  • 刻画权重损失景观的平坦性与鲁棒泛化差之间的关系。
  • 提出并验证 Adversarial Weight Perturbation (AWP) 通过双重扰动(输入和权重)来显式正则化权重损失景观。
  • 展示 AWP 与现有对抗训练方法整合时的兼容性及鲁棒性提升。

提出的方法

  • 使用 PGD 生成的即时对抗样本来刻画权重损失景观。
  • 展示平坦的权重损失景观与较小的鲁棒泛化差之间的相关性。
  • 提出 Adversarial Weight Perturbation (AWP),在受控区域(γ||w_l||)内最大化权重扰动效果。
  • 优化一个双重扰动目标,在对抗输入扰动与对抗权重扰动之间交替。
  • 提供一个算法(AT-AWP),先更新扰动后的模型 f_{w+v},然后再更新中心权重 w。
  • 在最小开销下将 AWP 扩展到其他对抗训练框架(TRADES、MART、RST)。

实验结果

研究问题

  • RQ1在对抗训练下,权重损失景观的平坦性是否与鲁棒泛化差相关?
  • RQ2通过对抗权重扰动对权重损失景观进行显式正则化,结合现有AT方法能否提高鲁棒性?
  • RQ3AWP 是否在多种数据集、架构和威胁模型下具备兼容性和有益性?
  • RQ4与随机权重扰动及其他正则化方法相比,AWP 在提升对抗鲁棒性方面的效果如何?

主要发现

威胁模型方法SVHN 最佳SVHN 最后CIFAR-10 最佳CIFAR-10 最后CIFAR-100 最佳CIFAR-100 最后
L_infinityAT53.3644.4952.7944.4427.2220.82
L_infinityAT-AWP59.1255.8755.3954.7330.7130.28
L2AT66.8765.0369.1565.9341.3335.27
L2AT-AWP72.5767.7372.6972.0845.6044.66
  • 更平坦的权重损失景观与对抗训练方法内外更小的鲁棒泛化差相关。
  • 对抗性权重扰动(AWP)显式地使权重损失景观变平并在与 AT、TRADES、MART、RST 集成时带来鲁棒性提升。
  • AWP 在 CIFAR-10/SVHN/CIFAR-100 及 L_infty 和 L2 威胁模型下稳定地提高测试鲁棒性。
  • AWP 相对于基线 AT 及可比方法实现提升,在白盒与黑盒攻击(包括 AutoAttack)中均有增益。
  • 消融研究表明,小幅相对权重扰动(γ 约为 1e-3 至 5e-3)能有效平坦化景观并降低鲁棒泛化差。
  • 相比随机权重扰动,AWP 提供更强的对抗损失提升和在更小扰动幅度下的更好鲁棒性。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。