QUICK REVIEW

[论文解读] MetaPoison: Practical General-purpose Clean-label Data Poisoning

Wei Huang, Jonas Geiping|arXiv (Cornell University)|Apr 1, 2020

Adversarial Robustness in Machine Learning参考文献 37被引用 81

一句话总结

MetaPoison 提出了一种基于一阶元学习的中毒方法，用于生成干净标签的中毒样本，以误导从头开始训练或微调的深度网络，能跨模型甚至黑盒 API 进行迁移。它在非常小的中毒预算下实现高攻击成功率，并使得新型中毒方案成为可能。

ABSTRACT

Data poisoning -- the process by which an attacker takes control of a model by making imperceptible changes to a subset of the training data -- is an emerging threat in the context of neural networks. Existing attacks for data poisoning neural networks have relied on hand-crafted heuristics, because solving the poisoning problem directly via bilevel optimization is generally thought of as intractable for deep models. We propose MetaPoison, a first-order method that approximates the bilevel problem via meta-learning and crafts poisons that fool neural networks. MetaPoison is effective: it outperforms previous clean-label poisoning methods by a large margin. MetaPoison is robust: poisoned data made for one model transfer to a variety of victim models with unknown training settings and architectures. MetaPoison is general-purpose, it works not only in fine-tuning scenarios, but also for end-to-end training from scratch, which till now hasn't been feasible for clean-label attacks with deep nets. MetaPoison can achieve arbitrary adversary goals -- like using poisons of one class to make a target image don the label of another arbitrarily chosen class. Finally, MetaPoison works in the real-world. We demonstrate for the first time successful data poisoning of models trained on the black-box Google Cloud AutoML API. Code and premade poisons are provided at https://github.com/wronnyhuang/metapoison

研究动机与目标

激发并使实际的干净标签数据中毒对深度神经网络成为可能。
使用元学习和集成代理开发对双层中毒的可扩展近似。
在从头训练和微调场景中展示有效性。
展示跨体系结构的可迁移性以及对训练设置的鲁棒性。
探索现实世界的适用性，包括黑盒 ML API 及新型中毒方案。

提出的方法

将中毒设定为受约束的双层优化问题，在被污染数据训练后中毒样本最大化对抗性损失。
使用 ReColorAdv 感知扰动，确保中毒样本在 L∞ 范围内在视觉上不显眼。
通过展开少数 SGD 步（K=2）来近似内部训练目标，以估计外部梯度。
通过在训练轮中错开的部分训练代理模型的集合来打造中毒，以提高对初始化的泛化。
使用跨多个轮次的模型集合的梯度信息来更新中毒，并进行重新初始化以避免过拟合到单一模型状态。
保留实际的计算预算（例如在报道设置中每个中毒样本 5760 次前向/后向传递）并在优化过程中对 ε 与 εc 边界进行投影。

实验结果

研究问题

RQ1MetaPoison 是否能够为从头训练的模型（不仅仅是微调网络）打造有效的干净标签中毒？
RQ2用 MetaPoison 制作的中毒是否能在不同的受害者体系结构、初始化和训练设置之间迁移？
RQ3在现实世界的黑盒系统（如 Google Cloud AutoML）以及替代的中毒方案（自我隐藏、多类中毒）下，中毒样本是否有效？
RQ4在不同体系结构和数据集上，中毒预算与攻击成功之间的权衡是什么？
RQ5在数据增强和各种超参数下，所 crafting 的中毒样本是否仍然有效？

主要发现

MetaPoison 在很小的中毒预算下达到很高的攻击成功率，例如对端到端训练的网络在 1% 中毒预算下的成功率为 40–90%。
将 dog-bird 目标下的 ResNet20 在 1% 中毒预算下攻击成功率达到 72%。
即使中毒预算低至 0.01%，在端到端训练中也可在多架构上实现非零成功率。
在持续评估中，自我隐藏和多类中毒方案使得中毒目标更加灵活，先前未测试过。
被污染的 CIFAR-10 模型在 Google Cloud AutoML Vision 上以 0.5% 的中毒预算能成功实施，达到可观的成功率（>15%）。
在一个体系结构上制作的中毒样本能迁移到其他体系结构（如 ConvNetBN、VGG13、ResNet20），具有显著但非对称的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。