QUICK REVIEW

[论文解读] Enhancing Adversarial Example Transferability with an Intermediate Level Attack

Qian Huang, Isay Katsman|arXiv (Cornell University)|Jul 23, 2019

Adversarial Robustness in Machine Learning参考文献 31被引用 40

一句话总结

ILA 通过在预先指定的中间层强调扰动，微调现有对抗样本，以提升跨模型的黑盒转移性。

ABSTRACT

Neural networks are vulnerable to adversarial examples, malicious inputs crafted to fool trained models. Adversarial examples often exhibit black-box transfer, meaning that adversarial examples for one model can fool another model. However, adversarial examples are typically overfit to exploit the particular architecture and feature representation of a source model, resulting in sub-optimal black-box transfer attacks to other target models. We introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model, improving upon state-of-the-art methods. We show that we can select a layer of the source model to perturb without any knowledge of the target models while achieving high transferability. Additionally, we provide some explanatory insights regarding our method and the effect of optimizing for adversarial examples using intermediate feature maps. Our code is available at https://github.com/CUVL/Intermediate-Level-Attack.

研究动机与目标

激发并解决在黑盒设置中白盒对抗攻击的转移性差的问题。
引入中间层攻击（ILA），通过中间层扰动来微调现有对抗样本。
提供一种不需要访问目标模型的层选择策略。
提供理论与实证洞见，解释为何中间表示会影响转移性。

提出的方法

定义两种ILA变体：ILAP（基于投影的损失）和ILAF（灵活损失，亦控制幅度和方向）。
以微调步骤运行：从基线攻击A生成的预先对抗样本x'出发，在epsilon-ball内优化以在所选层l最大化扰动。
ILAP损失：L = -Δy_l'' · Δy_l'，其中Δy_l'和Δy_l''分别是层l输出在x'和x''上的差值。
ILAF损失：L = -α * ||Δy_l''||_2 / ||Δy_l'||_2 - (Δy_l'' / ||Δy_l''||_2) · (Δy_l' / ||Δy_l'||_2)。
层选择指南：识别在各层中扰动值出现最新峰值的层，该峰值与更高的转移性相关。
在CIFAR-10与ImageNet上，对多种模型（如ResNet18、SENet18、DenseNet121、GoogLeNet）以及对比I-FGSM、MI-FGSM和CARLINI-WAGNER变体等基线进行评估。

实验结果

研究问题

RQ1扰动源模型的中间层是否能够提高对抗样本的黑盒转移性？
RQ2是否存在无需转移模型访问即可事先识别的逐层扰动模式以最大化转移性？
RQ3在标准数据集和ImageNet上，ILAP和ILAF与现有的转移聚焦攻击（如TAP、DI2-FGSM）相比如何？
RQ4选择近似最优的中间层是否能在不同目标模型和体系结构之间泛化？

主要发现

ILA 在多个模型和数据集上提高了相对于基线攻击的转移性。
针对特定中间层（尤其是某些后部层）可以获得更强的转移性，且无需访问目标模型即可完成层选择。
ILAP 通常优于基线攻击，在ImageNet设置中甚至可超越某些最先进的转移攻击，如TAP和DI2-FGSM。
ILAF 还能进一步提高转移性，但要超越ILAP需进行针对模型的超参数调整。
所提出的层选择启发式方法与更高的转移性相关，可在不在转移模型上评估的情况下选择近似最优的层。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。