Skip to main content
QUICK REVIEW

[论文解读] Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation

Zeyu Qin, Yanbo Fan|arXiv (Cornell University)|Oct 12, 2022
Adversarial Robustness in Machine Learning被引用 22
一句话总结

本文提出 Reverse Adversarial Perturbation (RAP),这是一种在损失下降平坦区域中搜索对抗样本的最小–最大双层优化攻击,以提升跨模型的转移性,并提出 RAP-LS(late-start)以进一步提高效率和效果。

ABSTRACT

Deep neural networks (DNNs) have been shown to be vulnerable to adversarial examples, which can produce erroneous predictions by injecting imperceptible perturbations. In this work, we study the transferability of adversarial examples, which is significant due to its threat to real-world applications where model architecture or parameters are usually unknown. Many existing works reveal that the adversarial examples are likely to overfit the surrogate model that they are generated from, limiting its transfer attack performance against different target models. To mitigate the overfitting of the surrogate model, we propose a novel attack method, dubbed reverse adversarial perturbation (RAP). Specifically, instead of minimizing the loss of a single adversarial point, we advocate seeking adversarial example located at a region with unified low loss value, by injecting the worst-case perturbation (the reverse adversarial perturbation) for each step of the optimization procedure. The adversarial attack with RAP is formulated as a min-max bi-level optimization problem. By integrating RAP into the iterative process for attacks, our method can find more stable adversarial examples which are less sensitive to the changes of decision boundary, mitigating the overfitting of the surrogate model. Comprehensive experimental comparisons demonstrate that RAP can significantly boost adversarial transferability. Furthermore, RAP can be naturally combined with many existing black-box attack techniques, to further boost the transferability. When attacking a real-world image recognition system, Google Cloud Vision API, we obtain 22% performance improvement of targeted attacks over the compared method. Our codes are available at https://github.com/SCLBD/Transfer_attack_RAP.

研究动机与目标

  • 推动并解决白盒对抗样本向未见目标模型的转移性较差的问题。
  • 提出 RAP,以在损失图的平坦区域定位对抗点,减少代理模型过拟合。
  • 展示 RAP 在与现有黑盒攻击技术和 defenses 结合时的兼容性与性能提升。

提出的方法

  • 将转移攻击表述为一个先找到在当前对抗点邻域内的最坏扰动的最小–最大双层优化问题(RAP)。
  • 内部问题 (RAP):n^{rap} = arg max_{||n^{rap}||_∞ ≤ ε_n} L(M^s(G(x^{adv} + n^{rap}); θ), y_t) 通过投影梯度上升求解。
  • 外部问题:更新 x^{adv} 以在 RAP 改变输入后最小化损失,即 x^{adv} ← Clip_{B_ε(x)}[ x^{adv} - α sign(∇_{x^{adv}} L(M^s(G(x^{adv} + n^{rap}); θ), y_t)) ].
  • 引入 RAP-LS(late-start RAP)以在前期迭代中暂停 RAP 扰动以提高效率。
  • 证明 RAP 与各种输入变换以及现有转移攻击技术的兼容性。

实验结果

研究问题

  • RQ1在对抗样本周围强制平坦性如何影响向未见目标模型的转移性?
  • RQ2RAP 是否能在不同架构和防御下提升有目标和无目标的转移攻击?
  • RQ3将 RAP 与现有转移方法结合在多大程度上提升攻击性能?
  • RQ4一种晚启动变体(RAP-LS)在效率和效果上是否具有实际收益?

主要发现

  • RAP 相较于基线白盒攻击显著提升转移性;无目标平均 ASR 相对于 I 和 MI 提高分别为 9.6% 和 16.3%。
  • DI、TI、SI 与 Admix 也因 RAP 受益,获得显著提升(如分别为 10.9%、10.2%、9.3%、6.3%)。
  • RAP-LS 进一步提升转移性,在三种代表性组合下实现无目标平均 ASR 分别为 95.4%、97.6% 和 98.3%。
  • 在有目标攻击方面,RAP 的平均提升分别为 I、MI、TI、DI、SI、Admix 的 5.0%、8.1%、4.6%、10.4%、18.5%、15.1%。
  • RAP–LS 在多样化架构(Inception-ResNet-v2、NASNet-Large、ViT-B/16)和防御模型上显示出稳健增益,提升持续存在。
  • 与强基线如 TTP 相比,MTDSI+RAP-LS 在报道比较中可达到对先进生成/有目标方法的显著超越(如最高约 25.7% 的差异)。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。