QUICK REVIEW

[论文解读] Generating Adversarial Examples with Adversarial Networks

Chaowei Xiao, Bo Li|arXiv (Cornell University)|Jan 8, 2018

Adversarial Robustness in Machine Learning参考文献 34被引用 238

一句话总结

AdvGAN 训练基于 GAN 的生成器以产生感知上真实的对抗扰动，使在半白盒和黑盒攻击中实现快速、并具有高成功率，甚至对抗防御。

ABSTRACT

Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate adversarial perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.

研究动机与目标

说明需要高质量、高效生成的对抗样本。
提出 AdvGAN 以学习看起来真实同时能欺骗目标模型的扰动。
展示 AdvGAN 在半白盒和黑盒设置下的有效性。
展示 AdvGAN 对最先进防御的鲁棒性以及在大规模挑战中的表现。

提出的方法

引入一个生成器 G 和判别器 D，形成一个以输入 x 为条件的 GAN。
使用对抗损失 L_adv^f 将扰动引导到目标类别或远离真实类别。
结合 GAN 损失 L_GAN 以使生成的扰动在视觉上与原始数据相似。
添加铰链损失 L_hinge 以限制扰动幅度并稳定 GAN 训练。
将损失整合为 L = L_adv^f + α L_GAN + β L_hinge，并训练极小极大博弈 min_G max_D L。
对于黑盒攻击，使用静态并动态蒸馏来近似目标模型并相应地调整 G。

实验结果

研究问题

RQ1AdvGAN 是否能够生成在感知上真实的对抗样本，并有效在白盒和黑盒设置下欺骗模型？
RQ2相较于其他攻击，AdvGAN 在应对最先进防御方面的表现如何？
RQ3在不依赖迁移性的情况下，黑盒攻击是否也能有效进行？
RQ4动态蒸馏与静态蒸馏对黑盒攻击性能的影响是什么？
RQ5高分辨率对抗样本在实现高攻击成功率的同时，是否仍然保持感知真实？

主要发现

在半白盒设置下，AdvGAN 在 MNIST 和 CIFAR-10 上实现高攻击成功率（MNIST: A 97.9%，B 97.1%，C 98.3%；CIFAR-10: ResNet 94.7%，Wide ResNet 99.3%）。
带有动态蒸馏的黑盒攻击达到高成功率（MNIST b-D 93.4%，CIFAR-10 b-D 78.5% 对于 ResNet，81.8% 对于 Wide ResNet）。
在防御下，AdvGAN 表现强劲；在半白盒防御中，攻击率超过 FGSM 和部分 Opt 方法（例如：MNIST A 8.0%，某防御下 A: AdvGAN 11.5%；CIFAR-10 ResNet 16.03%（AdvGAN）对比 FGSM 11.9%）。
在 MadryLab 模型的 MNIST 挑战中，AdvGAN 在白盒下达到 88.93% 的准确度，在黑盒下达到 92.76%（挑战中的最高成绩）。
针对 Inception_v3 的高分辨率对抗样本在 299×299 时达到 100% 攻击成功，且 L_infinity 界限为 0.01；人类感知研究显示 AdvGAN 的样本几乎与良性图像同等真实（AMT：49.4% 选择 AdvGAN 更真实）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。