QUICK REVIEW

[论文解读] Constructing Unrestricted Adversarial Examples with Generative Models

Yang Song, Rui Shu|arXiv (Cornell University)|May 21, 2018

Adversarial Robustness in Machine Learning参考文献 49被引用 125

一句话总结

本文提出使用条件生成模型（AC-GAN）从头合成的无限制对抗样本，并显示它们可以在保持对人类仍然合法的情况下绕过认证防御和对抗训练。

ABSTRACT

Adversarial examples are typically constructed by perturbing an existing data point within a small matrix norm, and current defense methods are focused on guarding against this type of attack. In this paper, we propose unrestricted adversarial examples, a new threat model where the attackers are not restricted to small norm-bounded perturbations. Different from perturbation-based attacks, we propose to synthesize unrestricted adversarial examples entirely from scratch using conditional generative models. Specifically, we first train an Auxiliary Classifier Generative Adversarial Network (AC-GAN) to model the class-conditional distribution over data samples. Then, conditioned on a desired class, we search over the AC-GAN latent space to find images that are likely under the generative model and are misclassified by a target classifier. We demonstrate through human evaluation that unrestricted adversarial examples generated this way are legitimate and belong to the desired class. Our empirical results on the MNIST, SVHN, and CelebA datasets show that unrestricted adversarial examples can bypass strong adversarial training and certified defense methods designed for traditional adversarial attacks.

研究动机与目标

在对抗输入不再局限于对现有数据的小扰动的前提下，提出一个威胁模型。
提出一种从零开始使用类别条件生成模型生成无限制对抗样本的实际方法。
在多个数据集上评估该攻击对认证防御和对抗训练模型的有效性。

提出的方法

训练一个辅助分类器GAN（AC-GAN）以建模类别条件的图像分布。
在 AC-GAN 潜在空间中搜索，找到模型下可能出现且被目标分类器错分的图像。
优化损失函数 L = L0 + λ1 L1 + λ2 L2 以生成高保真度的无限制对抗样本，其中 L0 针对分类器，L1 对潜在变量进行正则化，L2 通过辅助分类器使其与源类别对齐。
可选地用微小的可训练噪声对生成的图像进行增强以提高多样性（噪声增强攻击）。
使用 Amazon Mechanical Turk 验证合成图像是否属于预期类别并对人类来说是合法的。

实验结果

研究问题

RQ1从零开始合成的无限制对抗样本是否能够在强防御下误导分类器？
RQ2在 MNIST、SVHN 和 CelebA 上，基于 AC-GAN 的无限制对抗样本对认证防御和对抗训练的效果如何？
RQ3无限制对抗样本在黑盒设置下是否对其他架构具有迁移性？
RQ4向生成器添加噪声是否能提升攻击效果或真实感？

主要发现

无限制对抗样本在 MNIST、SVHN 和 CelebA 数据集上对目标分类器的成功率很高（超过 84%）。
这些攻击绕过为扰动型攻击设计的认证防御，并对对抗训练模型构成威胁。
通过 MTurk 的人工评估证实，许多无限制对抗样本确实属于目标类别的合法图像。
噪声增强变体可以在某些模型上提升迁移性，但对其他模型的影响不同。
无限制对抗样本对其他架构显示中等程度的迁移性，表明潜在的黑盒风险。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。