[论文解读] Deep Defense: Training DNNs with Improved Adversarial Robustness
Deep Defense 引入一种基于扰动的正则化项,整合到分类器训练目标,以在保持对良性数据准确性的同时提高深度神经网络对对抗攻击的鲁棒性。
Despite the efficacy on a variety of computer vision tasks, deep neural networks (DNNs) are vulnerable to adversarial attacks, limiting their applications in security-critical systems. Recent works have shown the possibility of generating imperceptibly perturbed image inputs (a.k.a., adversarial examples) to fool well-trained DNN classifiers into making arbitrary predictions. To address this problem, we propose a training recipe named "deep defense". Our core idea is to integrate an adversarial perturbation-based regularizer into the classification objective, such that the obtained models learn to resist potential attacks, directly and precisely. The whole optimization problem is solved just like training a recursive network. Experimental results demonstrate that our method outperforms training with adversarial/Parseval regularizations by large margins on various datasets (including MNIST, CIFAR-10 and ImageNet) and different DNN architectures. Code and models for reproducing our results are available at https://github.com/ZiangYan/deepdefense.pytorch
研究动机与目标
- 通过解决对抗性脆弱性来推动用于安全关键任务的鲁棒DNN。
- 提出一种基于扰动的正则化项,直接从对抗样本中学习。
- 在提升对攻击者的抵抗力的同时,保持或提高对良性输入的准确性。
- 提供一个可微分的基于网络的表述,便于高效优化。
提出的方法
- 建立一个正则化目标,惩罚对抗性扰动范数。
- 使用基于DeepFool的模块来近似对抗扰动以计算 Delta_x。
- 将扰动计算表示为反向/递归网络,以实现联合优化。
- 使用一个指数型函数 R 来强调更难被攻击而仍被正确分类的样本。
- 通过对正确分类和误分类样本的样本特异加权,平衡鲁棒性与准确性。
- 对现有模型进行微调而不是从头训练,以评估跨体系结构的迁移性。
实验结果
研究问题
- RQ1在训练过程中整合基于扰动的正则化项是否能在标准数据集和体系结构上提升对抗攻击的鲁棒性?
- RQ2Deep Defense 是否能在不降低良性集合准确度的情况下提升鲁棒性,适用于 MNIST、CIFAR-10 和 ImageNet?
- RQ3在如 DeepFool 和 FGS 的强攻击下,该方法与对抗训练和 Parseval 训练相比如何?
- RQ4超参数和分层正则化对鲁棒性和准确性有何影响?
- RQ5在保持计算可行性的前提下,该方法是否可扩展到大规模网络和数据集?
主要发现
| 数据集 | 网络 | 方法 | 准确度 | ρ2 | Acc.@ 0.2ε_ref | Acc.@ 0.5ε_ref | Acc.@ 1.0ε_ref |
|---|---|---|---|---|---|---|---|
| MNIST | MLP | Reference | 98.31% | 1.11e-1 | 72.76% | 29.08% | 3.31% |
| MNIST | MLP | Par. Train | 98.32% | 1.11e-1 | 77.44% | 28.95% | 2.96% |
| MNIST | MLP | Adv. Train I | 98.49% | 1.62e-1 | 87.70% | 59.69% | 22.55% |
| MNIST | MLP | Ours | 98.65% | 2.25e-1 | 95.04% | 88.93% | 50.00% |
| MNIST | LeNet | Reference | 99.02% | 2.05e-1 | 90.95% | 53.88% | 19.75% |
| MNIST | LeNet | Par. Train | 99.10% | 2.03e-1 | 91.68% | 66.48% | 19.64% |
| MNIST | LeNet | Adv. Train I | 99.18% | 2.63e-1 | 95.20% | 74.82% | 41.40% |
| MNIST | LeNet | Ours | 99.34% | 2.84e-1 | 96.51% | 88.93% | 50.00% |
| CIFAR-10 | ConvNet | Reference | 79.74% | 2.59e-2 | 61.62% | 37.84% | 23.85% |
| CIFAR-10 | ConvNet | Par. Train | 80.48% | 3.42e-2 | 69.19% | 50.43% | 22.13% |
| CIFAR-10 | ConvNet | Adv. Train I | 80.65% | 3.05e-2 | 65.16% | 45.03% | 35.53% |
| CIFAR-10 | ConvNet | Ours | 81.70% | 5.32e-2 | 72.15% | 59.02% | 50.00% |
| CIFAR-10 | NIN | Reference | 89.64% | 4.20e-2 | 75.61% | 49.22% | 33.56% |
| CIFAR-10 | NIN | Par. Train | 88.20% | 4.33e-2 | 75.39% | 49.75% | 17.74% |
| CIFAR-10 | NIN | Adv. Train I | 89.87% | 5.25e-2 | 78.87% | 58.85% | 45.90% |
| CIFAR-10 | NIN | Ours | 89.96% | 5.58e-2 | 80.70% | 70.73% | 50.00% |
| ImageNet | AlexNet | Reference | 56.91% | 2.98e-3 | 54.62% | 51.39% | 46.05% |
| ImageNet | AlexNet | Ours | 57.11% | 4.54e-3 | 55.79% | 53.50% | 50.00% |
| ImageNet | ResNet | Reference | 69.64% | 1.63e-3 | 63.39% | 54.45% | 41.70% |
| ImageNet | ResNet | Ours | 69.66% | 2.43e-3 | 65.53% | 59.46% | 50.00% |
- Deep Defense 在鲁棒性指标上始终优于竞争防御,同时在 MNIST、CIFAR-10 和 ImageNet 上保持或提高对良性数据的准确性。
- 在 MNIST 上,我们的方法达到 98.65% 的良性准确度,相比 98.31%(Reference)在对 DeepFool 和 FGS 攻击的鲁棒性显著提升。
- 在 LeNet 上,我们的方法将良性准确度提升至 99.34%,并在鲁棒性上超越对抗性/ Parseval 基线。
- 在 CIFAR-10(ConvNet、NIN)上,我们的方法获得更高的良性准确度且鲁棒性显著更好,例如两种网络的 Acc.@1.0ε_ref 均为 50.00%。
- 在 ImageNet(AlexNet、ResNet)上,我们的方法带来适度的良性准确度提升,并在 DeepFool 鲁棒性方面约提升 1.5×。
- 一种带有指数权重 R 的混合正则化有助于将鲁棒性关注点放在易受攻击的正确分类样本上,同时不牺牲整体性能。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。