Skip to main content
QUICK REVIEW

[论文解读] Simple Black-Box Adversarial Perturbations for Deep Networks

Nina Narodytska, Shiva Prasad Kasiviswanathan|arXiv (Cornell University)|Dec 19, 2016
Adversarial Robustness in Machine Learning参考文献 14被引用 165
一句话总结

本文表明深度卷积神经网络对黑盒对抗扰动敏感,通过扰动极少像素来构造被错误分类的图像,且无需访问模型参数。它引入随机像素攻击和贪心局部搜索方法,在黑盒威胁模型下生成对抗样本。

ABSTRACT

Deep neural networks are powerful and popular learning models that achieve state-of-the-art pattern recognition performance on many computer vision, speech, and language processing tasks. However, these networks have also been shown susceptible to carefully crafted adversarial perturbations which force misclassification of the inputs. Adversarial examples enable adversaries to subvert the expected system behavior leading to undesired consequences and could pose a security risk when these systems are deployed in the real world. In this work, we focus on deep convolutional neural networks and demonstrate that adversaries can easily craft adversarial examples even without any internal knowledge of the target network. Our attacks treat the network as an oracle (black-box) and only assume that the output of the network can be observed on the probed inputs. Our first attack is based on a simple idea of adding perturbation to a randomly selected single pixel or a small set of them. We then improve the effectiveness of this attack by carefully constructing a small set of pixels to perturb by using the idea of greedy local-search. Our proposed attacks also naturally extend to a stronger notion of misclassification. Our extensive experimental results illustrate that even these elementary attacks can reveal a deep neural network's vulnerabilities. The simplicity and effectiveness of our proposed schemes mean that they could serve as a litmus test for designing robust networks.

研究动机与目标

  • 评估最先进的 CNN 在只有 oracle 访问的黑盒对抗攻击下的脆弱性。
  • 证明扰动单个像素或少量像素就能造成错误分类。
  • 开发并评估基于贪心局部搜索的攻击,以降低所需扰动。
  • 将攻击扩展到 k-misclassification,其中真实标签不在前-k个预测之内。

提出的方法

  • 将网络视为一个 oracle,并在探测输入上观察输出。
  • 研究对单个像素(或小集合)进行符号保持扰动的效果。
  • 定义关键像素和在扰动时可能导致错误分类的关键集合。
  • 提出 RandAdv:一种随机像素扰动方法,用于估计关键像素的比例。
  • 开发一个贪心局部搜索攻击,通过扰动少量像素来最小化真实标签仍处于前-k预测中的概率。
  • 通过扰动像素集合(例如 50 像素)和更大扰动,将方法扩展到高分辨率图像。

实验结果

研究问题

  • RQ1黑盒对手是否仅通过扰动单个像素或极少数像素就能导致错误分类?
  • RQ2扰动幅度如何影响关键像素的存在性与可检测性?
  • RQ3在黑盒访问下,贪心局部搜索策略是否能以有限的扰动生成有效的对抗样本?
  • RQ4在黑盒威胁模型下是否可能实现 k-misclassification?

主要发现

  • 随机选择的单个像素扰动就能在许多数据集上频繁导致错误分类。
  • 增大扰动幅度会提高关键像素的比例以及 RandAdv 的成功率。
  • 对于高分辨率图像,扰动约 50 个像素就能有效生成对抗样本。
  • 贪心局部搜索方法在不需要访问网络梯度的情况下产生对抗图像,且扰动较小。
  • 这些攻击可以实现 k-misclassification,确保真实标签位于前 k 个预测之外。
  • 在 ImageNet1000 上,该方法平均只扰动约 0.5% 的像素。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。