[论文解读] Provable defenses against adversarial examples via the convex outer adversarial polytope
本文提出一种方法,通过在对抗多面体的凸外界界上进行优化,训练深度 ReLU 分类器以获得对范数有界对抗扰动的可证明鲁棒性;该方法通过一个实现高效训练的对偶网络实现。它在 MNIST 及其他数据集上实现了认证鲁棒性,在若干任务上优于以往的界限。
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded $\ell_\infty$ norm less than $ε= 0.1$), and code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial.
研究动机与目标
- 激励并量化对抗扰动具有可证明鲁棒性的分类器的需求。
- 引入深度 ReLU 网络的对抗多面体的凸外界界(凸松弛)。
- 开发一种对偶网络方法,在训练过程中高效计算鲁棒损失界。
- 提供一个训练目标,使得得到可证明鲁棒的分类器并在未见数据上实现攻击检测。
提出的方法
- 为 k 层 ReLU 网络定义对抗多面体 Z_epsilon(x)。
- 用凸上包络替代 ReLU 约束,形成可处理的凸外界界 tilde{Z}_epsilon(x}。
- 推导得到结果线性规划的对偶形式,得到类似标准反向传播的网络般后向传播,提供界 J_epsilon(x, g_theta)。
- 通过基于反向传播算法(算法1)利用对偶结构,计算激活下界 ell 和上界 u。
- 使用定理2(L(-J_epsilon(...), y))定义的鲁棒损失,作为对等于 epsilon 球内最坏情况损失的上界来训练。
- 给出可证明的鲁棒性保证(推论1、2)并计算到决策边界的 epsilon 距离(式17)。
实验结果
研究问题
- RQ1我们能否训练出对范数有界对抗扰动具有可证明鲁棒性的深度 ReLU 网络?
- RQ2是否可以通过类似于标准反向传播的对偶形式高效地计算紧密的鲁棒损失界?
- RQ3在 MNIST、Fashion-MNIST、HAR 与 SVHN 上,与非鲁棒基线和其他鲁棒方法相比,可以达到的经验鲁棒性保证是什么?
主要发现
| 问题 | 鲁棒 | ε | 测试误差 | FGSM 误差 | PGD 误差 | 鲁棒误差界 |
|---|---|---|---|---|---|---|
| MNIST | × | 0.1 | 1.07% | 50.01% | 81.68% | 100% |
| MNIST | √ | 0.1 | 1.80% | 3.93% | 4.11% | 5.82% |
| Fashion-MNIST | × | 0.1 | 9.36% | 77.98% | 81.85% | 100% |
| Fashion-MNIST | √ | 0.1 | 21.73% | 31.25% | 31.63% | 34.53% |
| HAR | × | 0.05 | 4.95% | 60.57% | 63.82% | 81.56% |
| HAR | √ | 0.05 | 7.80% | 21.49% | 21.52% | 21.90% |
| SVHN | × | 0.01 | 16.01% | 62.21% | 83.43% | 100% |
| SVHN | √ | 0.01 | 20.38% | 33.28% | 33.74% | 40.67% |
- 在 MNIST 上,鲁棒模型在 l_infinity 攻击下的鲁棒测试误差为 5.82%,epsilon=0.1;而非鲁棒模型的鲁棒界为 100%,并且在对抗下实际误差更高。
- 鲁棒模型显著降低 FGSM 和 PGD 的误差(分别为 3.93% 和 4.11%),相比标准模型(50.01% 和 81.68%)。
- 在各数据集上,鲁棒界明显比基于 PGD 的鲁棒表现更紧凑,例如 Fashion-MNIST 的鲁棒误差 34.53% 对比 PGD 31.63%(在同一数量级内)。
- 该方法可扩展到卷积网络和中等规模问题,获得可保证鲁棒性的最大验证网络(例如 MNIST)。
- 该方法在对抗检测方面实现零假阴性:如果该界对鲁棒性有认证,则该样本在 epsilon 范围内不可能是对抗样例。
- 对偶网络通过一次反向传播即可高效计算鲁棒界,避免使用传统的 LP 求解器。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。