QUICK REVIEW

[论文解读] PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples

Yang Song, Taesup Kim|arXiv (Cornell University)|Oct 30, 2017

Adversarial Robustness in Machine Learning被引用 338

一句话总结

PixelDefend 使用 PixelCNN 生成模型来检测并净化对抗样本，将输入向训练分布回归，以在模型和攻击无关的情况下恢复分类器准确性。

ABSTRACT

Adversarial perturbations of normal images are usually imperceptible to humans, but they can seriously confuse state-of-the-art machine learning models. What makes them so special in the eyes of image classifiers? In this paper, we show empirically that adversarial examples mainly lie in the low probability regions of the training distribution, regardless of attack types and targeted models. Using statistical hypothesis testing, we find that modern neural density models are surprisingly good at detecting imperceptible image perturbations. Based on this discovery, we devised PixelDefend, a new approach that purifies a maliciously perturbed image by moving it back towards the distribution seen in the training data. The purified image is then run through an unmodified classifier, making our method agnostic to both the classifier and the attacking method. As a result, PixelDefend can be used to protect already deployed models and be combined with other model-specific defenses. Experiments show that our method greatly improves resilience across a wide variety of state-of-the-art attacking methods, increasing accuracy on the strongest attack from 63% to 84% for Fashion MNIST and from 32% to 70% for CIFAR-10.

研究动机与目标

将对抗样本作为数据分布的离群点，而非仅仅分类器故障来研究的动机。
假设不可感知的对抗扰动大多位于训练分布的低概率区域。
开发与分类器、攻击无关的检测与净化技术。
展示净化可以与现有防御措施结合以提升鲁棒性。
证明在基准数据集上对广泛攻击的前沿鲁棒性。

提出的方法

在干净的训练数据上训练 PixelCNN 生成模型以估计 p(X)，即图像的数据分布。
使用似然性通过采用置换检验的统计测试来生成基于 p 值的对抗输入检测。
提出 PixelDefend：通过贪婪解码过程，在 epsilon_defend 邻域内将输入向 PixelCNN 模型下的更高概率方向移动以净化输入。
提供一种自适应变体，根据输入在生成模型下的概率调节 epsilon_defend，以尽量减少对干净图像的影响。
不修改分类器；PixelDefend 与模型和攻击无关，并且可以与对抗性训练或其他防御方法结合。
在 Fashion-MNIST 和 CIFAR-10 上对多种攻击（RAND、FGSM、BIM、DeepFool、CW）使用 ResNet 和 VGG 分类器进行评估。

实验结果

研究问题

RQ1如神经密度模型估计，对抗样本是否主要位于训练分布的低概率区域？
RQ2基于生成模型的检测器（通过 p 值）是否能在多种攻击方法中可靠地识别对抗输入？
RQ3将图像净化至训练分布的高密度区域，是否能在不知攻击者或分类器的情况下恢复对强攻击的分类器准确性？
RQ4将 PixelDefend 与其他防御措施结合在不同数据集和攻击类型上是否显著提升鲁棒性？
RQ5是否可对完整的 PixelDefend 流水线实现端到端可微攻击？若可，其有效性如何？

主要发现

对抗样本通常比干净图像的 PixelCNN 似然性低出数个数量级。
使用 PixelCNN 似然性的 p 值检测器在广泛攻击中能够以高概率区分对抗输入。
PixelDefend 的净化将被扰动的图像移向高密度区域，结合现有分类器时，在强攻击下显著提高准确性（例如 Fashion-MNIST 最强攻击从 63% 提升到 84%，CIFAR-10 从 32% 提升到 70%）。
PixelDefend 是与模型无关、与攻击无关的，兼容对抗性训练，并且在不修改分类器的情况下增强鲁棒性。
针对完整 PixelDefend 流水线的端到端对抗攻击较难构建，迭代梯度基攻击在实际中未能找到有效扰动。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。