[论文解读] Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models
边界攻击是一种简单、有效的基于决策的对抗性攻击,通过从一个大的对抗扰动开始并在决策边界上逐步缩小扰动,在对标准视觉任务上与基于梯度的攻击具有可比性,适用于黑箱模型。
Many machine learning algorithms are vulnerable to almost imperceptible perturbations of their inputs. So far it was unclear how much risk adversarial perturbations carry for the safety of real-world machine learning applications because most methods used to generate such perturbations rely either on detailed model information (gradient-based attacks) or on confidence scores such as class probabilities (score-based attacks), neither of which are available in most real-world scenarios. In many such cases one currently needs to retreat to transfer-based attacks which rely on cumbersome substitute models, need access to the training data and can be defended against. Here we emphasise the importance of attacks which solely rely on the final model decision. Such decision-based attacks are (1) applicable to real-world black-box models such as autonomous cars, (2) need less knowledge and are easier to apply than transfer-based attacks and (3) are more robust to simple defences than gradient- or score-based attacks. Previous attacks in this category were limited to simple models or simple datasets. Here we introduce the Boundary Attack, a decision-based attack that starts from a large adversarial perturbation and then seeks to reduce the perturbation while staying adversarial. The attack is conceptually simple, requires close to no hyperparameter tuning, does not rely on substitute models and is competitive with the best gradient-based attacks in standard computer vision tasks like ImageNet. We apply the attack on two black-box algorithms from Clarifai.com. The Boundary Attack in particular and the class of decision-based attacks in general open new avenues to study the robustness of machine learning models and raise new questions regarding the safety of deployed machine learning systems. An implementation of the attack is available as part of Foolbox at https://github.com/bethgelab/foolbox .
研究动机与目标
- 强调决策基于攻击对真实世界黑箱模型的相关性。
- 将边界攻击介绍为在复杂数据集上首个有效的基于决策的方法。
- 表明基于决策的攻击能够破解某些防御策略。
- 展示在真实世界黑箱 API(Clarifai)以及标准视觉基准中的适用性。
提出的方法
- 提出一种沿边界的攻击,从对抗样本出发并使用拒绝采样在边界上向最小扰动移动。
- 使用简单的提议分布:采样高斯方向,投影到球面,并朝原始输入方向移动,配备两个可调步长(正交方向和指向原点方向)。
- 允许任意对抗标准,只需最终模型决策,而无需置信度或梯度信息。
- 基于局部边界几何,以接近信赖域的策略动态调整扰动长度和步长。
- 在 MNIST、CIFAR-10 和 ImageNet 的无目标和有目标设定下,使用标准架构(VGG-19、ResNet-50、Inception-v3)进行评估。
- 在与梯度基攻击(FGSM、DeepFool、Carlini & Wagner)在扰动大小和对防御的鲁棒性方面进行对比。
实验结果
研究问题
- RQ1一个基于决策的攻击在没有梯度或置信度分数的情况下,是否能可靠地产生针对复杂、真实世界模型的对抗样本?
- RQ2在无目标和有目标场景下,边界攻击在 MNIST、CIFAR-10 和 ImageNet 相对于基于梯度的方法的表现如何?
- RQ3边界攻击对梯度屏蔽和防御蒸馏等防御是否具备鲁棒性?
- RQ4边界攻击能否在黑箱、真实世界 API(如 Clarifai)中有效运行,仅观察到最终决策?
主要发现
| 攻击类型 | MNIST | CIFAR | VGG-19 | ResNet-50 | Inception-v3 |
|---|---|---|---|---|---|
| FGSM 基于梯度的 | 4.2e-02 | 2.5e-05 | 1.0e-06 | 1.0e-06 | 9.7e-07 |
| DeepFool 基于梯度的 | 4.3e-03 | 5.8e-06 | 1.9e-07 | 7.5e-08 | 5.2e-08 |
| Carlini & Wagner 基于梯度的 | 2.2e-03 | 7.5e-06 | 5.7e-07 | 2.2e-07 | 7.6e-08 |
| Boundary(我们的方法)基于决策的 | 3.6e-03 | 5.6e-06 | 2.9e-07 | 1.0e-07 | 6.5e-08 |
- 边界攻击在无目标设置下,在 MNIST、CIFAR 和 ImageNet 的对比中取得了具有竞争力的最小扰动。
- 在无目标的 ImageNet 实验中,边界攻击在 MNIST 的中位扰动为 3.6e-3,在 CIFAR 为 5.6e-6,在 VGG-19、ResNet-50、Inception-v3 分别为 2.9e-7、1.0e-7、6.5e-8。
- 在有目标设置中,边界攻击的扰动为 6.5e-3(MNIST)、3.3e-5(CIFAR)和 9.9e-06(ImageNet 使用 VGG-19)。
- 在应用如防御蒸馏等防御时,边界攻击仍然有效,而梯度基攻击出现失败或性能下降,证明对梯度屏蔽具有鲁棒性。
- 在两个 Clarifai 黑箱模型(品牌和名人识别)上,边界攻击能够产生扰动通常在 1e-2 到 1e-3 左右的对抗样本,尽管有些样本需要更大扰动才导致误分类。
- 该攻击不需要反向传播,前向传播次数显著多于基于梯度的攻击,体现其对模型决策的依赖而非梯度。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。