QUICK REVIEW

[论文解读] Poisoning Attacks with Generative Adversarial Nets

Luis Muñoz-González, Bjarne Pfitzner|arXiv (Cornell University)|Jun 18, 2019

Adversarial Robustness in Machine Learning参考文献 28被引用 39

一句话总结

本论文提出了 pGAN，一种基于生成对抗网络的框架，包含生成器、判别器和目标分类器，用以构造污染样本，使分类器性能下降同时保持可检测性。

ABSTRACT

Machine learning algorithms are vulnerable to poisoning attacks: An adversary can inject malicious points in the training dataset to influence the learning process and degrade the algorithm's performance. Optimal poisoning attacks have already been proposed to evaluate worst-case scenarios, modelling attacks as a bi-level optimization problem. Solving these problems is computationally demanding and has limited applicability for some models such as deep networks. In this paper we introduce a novel generative model to craft systematic poisoning attacks against machine learning classifiers generating adversarial training examples, i.e. samples that look like genuine data points but that degrade the classifier's accuracy when used for training. We propose a Generative Adversarial Net with three components: generator, discriminator, and the target classifier. This approach allows us to model naturally the detectability constrains that can be expected in realistic attacks and to identify the regions of the underlying data distribution that can be more vulnerable to data poisoning. Our experimental evaluation shows the effectiveness of our attack to compromise machine learning classifiers, including deep networks.

研究动机与目标

将数据污染作为机器学习中的安全威胁进行动机阐述，并评估现实世界的攻击约束。
提出一种可扩展的污染策略，利用生成对抗网络在深度网络中工作。
结合可检测性控制，建模现实攻击者的约束，并研究攻击有效性与隐蔽性之间的权衡。

提出的方法

引入 pGAN，包含三个组成部分：生成器、判别器和目标分类器。
建立一个极小极大博弈，生成器最大化对分类器的攻击和对判别器的规避的凸结合。
使用参数 alpha 来权衡可检测性与有效性，使用污染分数 lambda 来控制注入点。
在条件式 GAN 类似的设置中，通过协同的梯度更新进行训练，条件化于污染类别标签。
允许在黑盒场景中使用代理模型，并结合标准的 GAN 稳定化技术（dropout、batch normalization、标签平滑）。
提供关于训练动态的实际指南，包括 lambda 的角色以及极小极大目标的鞍点解。

实验结果

研究问题

RQ1基于 GAN 的框架是否可以生成对分类器性能有害的污染样本，但又接近真实数据？
RQ2可检测性约束（通过 alpha）如何影响污染的有效性与隐蔽性？
RQ3污染分数 lambda 对不同数据集和模型的攻击成功率有何影响？
RQ4pGAN 是否能够产生有针对性的、特定错误的攻击而不过度提高可检测性？
RQ5在可检测性约束下，pGAN 与传统污染方法的比较如何？

主要发现

在将污染点注入 MNIST 和 Fashion-MNIST 时，pGAN 能降低分类器准确率，且在较低的 alpha 值下效果更显著。
当 alpha 较高时，攻击更难被检测到，体现了隐蔽性和影响之间的权衡。
增加污染点的比例通常会提高攻击效果，但更大的数据集会降低污染的相对影响。
pGAN 能执行有针对性的、错误特定的攻击（例如将数字 3 错误分类为 5），即使污染分数很小。
与具有可检测性约束的标签翻转策略相比，pGAN 实现更高的攻击有效性并具有不同的错误分布（更具针对性，误报更少）。
随着训练数据集规模增大，攻击有效性下降，但在更大的模型上仍然可实现有针对性的攻击。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。