Skip to main content
QUICK REVIEW

[论文解读] Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers

Hadi Salman, Greg Yang|arXiv (Cornell University)|Jun 9, 2019
Adversarial Robustness in Machine Learning参考文献 39被引用 131
一句话总结

这篇论文通过对抗训练引入新颖的 SmoothAdv 攻击,提升对抗性平滑的可证明 l2鲁棒性,在 ImageNet 和 CIFAR-10 上达到最先进的结果。

ABSTRACT

Recent works have shown the effectiveness of randomized smoothing as a scalable technique for building neural network-based classifiers that are provably robust to $\ell_2$-norm adversarial perturbations. In this paper, we employ adversarial training to improve the performance of randomized smoothing. We design an adapted attack for smoothed classifiers, and we show how this attack can be used in an adversarial training setting to boost the provable robustness of smoothed classifiers. We demonstrate through extensive experimentation that our method consistently outperforms all existing provably $\ell_2$-robust classifiers by a significant margin on ImageNet and CIFAR-10, establishing the state-of-the-art for provable $\ell_2$-defenses. Moreover, we find that pre-training and semi-supervised learning boost adversarially trained smoothed classifiers even further. Our code and trained models are available at http://github.com/Hadisalman/smoothing-adversarial .

研究动机与目标

  • Improve provable l2-robustness of smoothed classifiers using adversarial training.
  • Develop an effective attack tailored to smoothed classifiers (SmoothAdv).
  • Demonstrate empirical and certifiable robustness gains on ImageNet and CIFAR-10.
  • Show the benefits of pre-training and semi-supervised learning in this framework.

提出的方法

  • Introduce SmoothAdv attack for smoothed classifiers and optimize it via projected gradient descent (PGD) or decoupled direction and norm (DDN).
  • Formulate adversarial training by maximizing a loss of the smoothed soft classifier and training on Gaussian-perturbed adversarial examples.
  • Estimate gradients of the SmoothAdv objective with Monte Carlo sampling of Gaussian noise.
  • Leverage randomized smoothing to obtain certifiable l2-robustness guarantees for the resulting model.
  • Incorporate pre-training and semi-supervised learning to boost robustness and certifiable accuracy.

实验结果

研究问题

  • RQ1Can adversarial training tailored to smoothed classifiers improve certifiable l2-robustness beyond existing methods?
  • RQ2How effective is the SmoothAdv attack in finding adversarial examples for smoothed classifiers?
  • RQ3What is the impact of training hyperparameters (m_train, sigma, epsilon, T) on certified robustness?
  • RQ4Do pre-training and semi-supervised learning further enhance certifiable robustness in this framework?

主要发现

  • Smoothed classifiers trained with SmoothAdv outperform all prior provably l2-robust classifiers on ImageNet and CIFAR-10 in certifiable accuracy across multiple radii.
  • On ImageNet, a ResNet-50 smoothed classifier achieves 56% provable top-1 accuracy at radius less than 127/255, improving over the previous 49%.
  • CIFAR-10 smoothed classifiers achieve up to 16% improvement over prior art, with further gains (up to 22%) when combining pre-training and semi-supervised learning.
  • Pre-training and semi-supervised learning consistently boost certified robustness in this framework.
  • The attack-guided adversarial training aligns model optimization with the certification objective, yielding higher certified robustness than vanilla adversarial training.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。