[论文解读] Adversarial Distributional Training for Robust Deep Learning
提出对抗性分布训练(ADT),一个极小极大框架,在输入周围学习对抗性分布,以提高对未见攻击的鲁棒性,具有三种参数化并在 CIFAR-10/100 和 SVHN 上的实证测试。
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples. However, most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks. Besides, a single attack algorithm could be insufficient to explore the space of perturbations. In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models. ADT is formulated as a minimax optimization problem, where the inner maximization aims to learn an adversarial distribution to characterize the potential adversarial examples around a natural one under an entropic regularizer, and the outer minimization aims to train robust models by minimizing the expected loss over the worst-case adversarial distributions. Through a theoretical analysis, we develop a general algorithm for solving ADT, and present three approaches for parameterizing the adversarial distributions, ranging from the typical Gaussian distributions to the flexible implicit ones. Empirical results on several benchmarks validate the effectiveness of ADT compared with the state-of-the-art AT methods.
研究动机与目标
- 激发对超越单一攻击对抗训练的未见对抗攻击的鲁棒性。
- 提出一种分布式极小极大(minimax)形式,将扰动视为分布而非点。
- 进行正则化,防止坍缩到 Delta 分布并鼓励多样化的对抗样本。
- 给出三种实用的对抗分布参数化,并分析它们对鲁棒性的影响。
提出的方法
- 将 ADT 表述为一个极小-极大问题:min_theta (1/n) ∑_i max_p(delta_i) E_{p}[L(f_theta(x_i+delta_i), y_i)].
- 在内层目标中加入信息熵正则化以防止退化为 Delta 分布: J(p, theta) = E_p[L(...)] + lambda H(p).
- 对对抗分布进行三种参数化: (i) ADT EXP 使用显式高斯基变换 delta = epsilon*tanh(u),u~N(mu, diag(sigma^2)); (ii) ADT EXP-AM 使用一个对 x 条件化的生成器 g_phi 进行摊销;(iii) ADT IMP-AM 通过一个带潜在变量 z 的生成器和通过变分熵估计得到的隐式密度实现隐式分布。
- 给出一个基于 Danskin 风格的序列优化的一般算法(Alg. 1):先求解内部最大化得到 p*,再通过在 p* 的梯度更新 theta。
实验结果
研究问题
- RQ11) 相对于点对点对抗训练,学习对抗扰动的分布是否可以在更广范围的攻击下提升鲁棒性?
- RQ22) 信息熵正则化如何影响所学习对抗扰动的多样性和有效性?
- RQ33) 显式、摊销显式和隐式对抗分布参数化在鲁棒性和训练效率方面的比较如何?
- RQ44) ADT 方法在标准基准下能否在白盒和黑盒攻击中保持鲁棒性?
主要发现
| 模型 | A_nat | FGSM | PGD-20 | PGD-100 | MIM | C&W | FeaAttack | A_rob |
|---|---|---|---|---|---|---|---|---|
| Standard | 94.81% | 12.05% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| AT FGSM | 93.80% | 79.86% | 0.12% | 0.04% | 0.06% | 0.13% | 0.01% | 0.01% |
| AT PGD † | 87.25% | 56.04% | 45.88% | 45.33% | 47.15% | 46.67% | 0? | 46.01% |
| AT PGD | 86.91% | 58.30% | 50.03% | 49.40% | 51.40% | 50.23% | 0? | 50.46% |
| ALP | 86.81% | 56.83% | 48.97% | 48.60% | 50.13% | 49.10% | 0? | 48.51% |
| FeaScatter | 89.98% | 77.40% | 70.85% | 68.81% | 72.74% | 58.46% | 37.45% | 37.40% |
| ADT EXP | 86.89% | 60.41% | 52.18% | 51.69% | 53.27% | 52.49% | 52.38% | 50.56% |
| ADT EXP-AM | 87.82% | 62.42% | 51.95% | 51.26% | 52.99% | 51.75% | 52.04% | 50.04% |
| ADT IMP-AM | 88.00% | 64.89% | 52.28% | 51.23% | 52.64% | 52.65% | 51.89% | 49.81% |
- ADT 基于的方法在对一系列白盒攻击(FGSM、PGD 变体、MIM、C&W、FeaAttack)上的鲁棒性方面,通常优于标准训练和具有竞争力的对抗训练。
- 在 CIFAR-10/10 下对白盒攻击,ADT 变体在鲁棒性方面通常高于许多基线,与 EXP-AM 和 IMP-AM 相比,ADT EXP 在若干设置中常常表现更好。
- ADT 基于的方法在黑盒转移和 SPSA 查询下显示出更好的鲁棒性,表明梯度遮蔽减弱,鲁棒性提升具有真实性。
- 熵正则化使对抗分布覆盖更广的扰动空间,带来自然输入周围更平滑的损失地形。
- 摊销式变体(EXP-AM、IMP-AM)提供更快的训练速度且鲁棒性相当,尽管在某些情况下显式 EXP 可能带来略强的防御效果。
- 在 CIFAR-10、CIFAR-100 和 SVHN 的经验结果验证了 ADT 对最先进 AT 方法的有效性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。