QUICK REVIEW

[论文解读] Certified Adversarial Robustness with Additive Noise

Bai Li, Changyou Chen|arXiv (Cornell University)|Sep 10, 2018

Adversarial Robustness in Machine Learning参考文献 48被引用 45

一句话总结

论文将对抗性鲁棒性与对高斯加性噪声的鲁棒性联系起来，并提出一种可扩展的认证防御，在测试时加入噪声并使用稳定性训练来收紧鲁棒性界限。

ABSTRACT

The existence of adversarial data examples has drawn significant attention in the deep-learning community; such data are seemingly minimally perturbed relative to the original data, but lead to very different outputs from a deep-learning algorithm. Although a significant body of work on developing defensive models has been considered, most such models are heuristic and are often vulnerable to adaptive attacks. Defensive methods that provide theoretical robustness guarantees have been studied intensively, yet most fail to obtain non-trivial robustness when a large-scale model and data are present. To address these limitations, we introduce a framework that is scalable and provides certified bounds on the norm of the input manipulation for constructing adversarial examples. We establish a connection between robustness against adversarial perturbation and additive random noise, and propose a training strategy that can significantly improve the certified bounds. Our evaluation on MNIST, CIFAR-10 and ImageNet suggests that the proposed method is scalable to complicated models and large data sets, while providing competitive robustness to state-of-the-art provable defense methods.

研究动机与目标

为深度网络在范数界限的对抗扰动下提供可扩展的认证鲁棒性方法进行动机说明和形式化
通过 Rényi 发散建立对抗鲁棒性与对加性随机噪声鲁棒性之间的联系
开发一种训练策略，在不牺牲自然准确性的前提下改进认证鲁棒性界限
在 MNIST、CIFAR-10 和 ImageNet 上给出经验评估，显示有竞争力的可证明与经验鲁棒性。

提出的方法

在测试阶段通过对输入加入高斯噪声来构造随机分类器，并使用 Rényi 发散推导对 l2 扰动的认证鲁棒性界限
提供 Algorithm 1 (Certified Robust Classifier) 以计算保持预测类别概率较高的扰动大小上界 L
利用 Lemma 1 与 Theorem 2（以及对 l1 的 Laplacian 噪声的 Theorem 3）将带噪声的输出分布与对抗鲁棒性联系起来并证明界限
通过 Stability Training with Noise (STN)（带噪声的稳定性训练）来增强鲁棒性界限，正则化模型在高斯扰动下的稳定性
可选地采用 Adversarial Logit Pairing 与稳定性目标，以在噪声下提升准确性且不依赖梯度屏蔽
在 MNIST、CIFAR-10 与 ImageNet 上对稳定性训练与噪声参数进行实验，以与 PixelDP 与 TRADES 进行对比。

实验结果

研究问题

RQ1在高斯噪声随机平滑下，保持正确类别的对抗扰动可以达到多大？
RQ2在测试阶段添加噪声并结合稳定性训练，是否能提供可认证鲁棒性界限，且可扩展到大规模网络与数据集？
RQ3所导出的界限在理论与实践中，与现有的可证明防御（如基于 LP 的方法或差分隐私方法）相比表现如何？
RQ4噪声水平对自然准确性和在不同攻击强度下的鲁棒性有何影响？

主要发现

为任何具有一般激活结构的分类器，在测试时加入高斯噪声时导出对 l2 扰动的认证界限
当噪声下前两类概率 p(1) 与 p(2) 的差距更大、以及优化噪声水平 sigma 时，界限会更好
Stability Training with Noise (STN) 在不带来较大计算负担的情况下显著提高了认证界限和经验鲁棒性
在 MNIST、CIFAR-10 和 ImageNet 的实验显示出与最先进的可证明防御相比的竞争性能，并对强攻击具有鲁棒性
STN 在面对更强攻击时保持比某些可证明防御更高的自然准确性，并实现具有竞争力的鲁棒性
该框架展示了一种可扩展的方法，可与现有模型集成并在自适应攻击下进行评估。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。