Skip to main content
QUICK REVIEW

[论文解读] On Certifying Robustness against Backdoor Attacks via Randomized Smoothing

Binghui Wang, Xiaoyu Cao|arXiv (Cornell University)|Feb 26, 2020
Adversarial Robustness in Machine Learning参考文献 28被引用 59
一句话总结

本论文将随机平滑扩展为对抗后门攻击的鲁棒性认证,通过将训练和预测视为一个基函数并在训练数据、测试数据和标签上添加噪声来界定攻击者扰动。它表明理论上可行,但发现现有的平滑方法对后门的效果有限,强调需要新的理论和方法。

ABSTRACT

Backdoor attack is a severe security threat to deep neural networks (DNNs). We envision that, like adversarial examples, there will be a cat-and-mouse game for backdoor attacks, i.e., new empirical defenses are developed to defend against backdoor attacks but they are soon broken by strong adaptive backdoor attacks. To prevent such cat-and-mouse game, we take the first step towards certified defenses against backdoor attacks. Specifically, in this work, we study the feasibility and effectiveness of certifying robustness against backdoor attacks using a recent technique called randomized smoothing. Randomized smoothing was originally developed to certify robustness against adversarial examples. We generalize randomized smoothing to defend against backdoor attacks. Our results show the theoretical feasibility of using randomized smoothing to certify robustness against backdoor attacks. However, we also find that existing randomized smoothing methods have limited effectiveness at defending against backdoor attacks, which highlight the needs of new theory and methods to certify robustness against backdoor attacks.

研究动机与目标

  • 为对抗自适应对手的后门攻击提供认证防御的动机。
  • 将随机平滑从对抗鲁棒性推广到后门鲁棒性。
  • 形式化一个捕捉训练与预测过程的基函数,并应用平滑以认证鲁棒性。
  • 在 MNIST 子集上进行实证评估以评估可行性和局限性。

提出的方法

  • 将训练与预测过程视为一个基函数 f,并应用随机平滑以得到一个具有认证半径的平滑函数 g。
  • 通过对训练数据 X、标签 y 和测试示例 x 添加噪声,采用逐维离散噪声模型,将平滑框架扩展到离散数据。
  • 通过蒙特罗采样估计 Pr(f(v⊕ε)=l) 的下界 p_l,并使用 Clopper–Pearson 推导认证半径 R(p_l)。
  • 在扰动数据集 (X⊕τ, y⊕ε) 下训练 N 个分类器,并在扰动的测试输入上评估预测,以计算用于认证的标签频率。
  • 通过 Bonferroni 校正为测试样本提供同时置信度保证。

实验结果

研究问题

  • RQ1Can randomized smoothing certify robustness against backdoor attacks by treating the entire training-and-prediction pipeline as a base function?
  • RQ2What is the certified radius achievable when defending backdoor perturbations under discrete data and backdoor-style perturbations?
  • RQ3How effective are existing randomized smoothing techniques, with additive noise, at mitigating backdoor threats in practice?
  • RQ4What are the theoretical and empirical limitations of smoothing-based certified defenses against backdoors?

主要发现

  • The approach is theoretically feasible to certify robustness against backdoor attacks using randomized smoothing.
  • On a MNIST subset, the method certifies robustness such that 36% of testing images are correctly classified when an attacker perturbes at most 2 pixels/labels of training data and pixels of the test image.
  • Existing randomized smoothing methods with additive noise show limited effectiveness against backdoor attacks in the evaluated setting.
  • The study underscores the need for new theory and methods to improve certification against backdoors.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。