QUICK REVIEW

[论文解读] Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching

Jonas Geiping, Liam Fowl|arXiv (Cornell University)|Sep 4, 2020

Adversarial Robustness in Machine Learning参考文献 53被引用 36

一句话总结

本文介绍了一种可扩展的从头训练的深度网络的干净标签定向数据投毒攻击，使用梯度对齐（梯度匹配）来构造被污染的数据，使训练偏离，从而使选定目标图像被错误分类。

ABSTRACT

Data Poisoning attacks modify training data to maliciously control a model trained on such data. In this work, we focus on targeted poisoning attacks which cause a reclassification of an unmodified test image and as such breach model integrity. We consider a particularly malicious poisoning attack that is both "from scratch" and "clean label", meaning we analyze an attack that successfully works against new, randomly initialized models, and is nearly imperceptible to humans, all while perturbing only a small fraction of the training data. Previous poisoning attacks against deep neural networks in this setting have been limited in scope and success, working only in simplified settings or being prohibitively expensive for large datasets. The central mechanism of the new attack is matching the gradient direction of malicious examples. We analyze why this works, supplement with practical considerations. and show its threat to real-world practitioners, finding that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset. Finally we demonstrate the limitations of existing defensive strategies against such an attack, concluding that data poisoning is a credible threat, even for large-scale deep learning systems.

研究动机与目标

激发并形式化目标化数据投毒，其中对少量训练图像在一个界限内进行扰动，以导致特定目标图像被错误分类。
开发一种可扩展的攻击，适用于在大型数据集（如 ImageNet）上从头训练的深度网络。
提出一个高效的优化目标，使被污染数据的梯度与对手的目标梯度对齐。
评估该攻击在不同体系结构和训练设置中的实用性与可迁移性。
评估防御措施并讨论当前缓解策略的局限性。

提出的方法

通过梯度对齐来公式化被污染的数据：最小化对抗损失梯度与被污染数据梯度之和的负余弦相似度。
在 l_infty 范围内优化扰动，以保持干净标签语义并确保不可感知。
使用可微分的数据增强和随机重启，以提高跨初始化和体系结构的可迁移性。
通过仅需要一个预训练模型和一个等同于一个 epoch 的优化来证明效率，避免完整的双层反向传播。
利用一个单一参数向量 theta 来影响投毒，在投毒过程中避免对 theta 进行更新。

实验结果

研究问题

RQ1梯度对齐是否能够在从头训练的现代深度网络上实现有效的干净标签定向数据投毒？
RQ2所提出的梯度匹配投毒对像 ImageNet 这样的大规模数据集以及不同架构的扩展性如何？
RQ3数据增强、重启和模型集成在攻击的可迁移性与鲁棒性中起到怎样的作用？
RQ4现有防御措施（净化、差分隐私）对梯度匹配投毒是否有效，以及它们的权衡是什么？

主要发现

在扰动有界（ε=8）时，攻击使用仅0.1% 的被污染数据即可在 ImageNet 上实现定向错误分类。
基于梯度对齐的投毒在效率和成功率上都显著超越先前方法（例如 MetaPoison），在 CIFAR-10 和大规模 ImageNet 实验中表现突出。
可微分数据增强可以替代大规模模型集成，在较低的计算成本下实现与之相当的投毒效果。
投毒能迁移到其他架构（例如 MobileNet-V2、ResNet-50），在现实威胁模型下的黑盒设置（Cloud AutoML）也可能有效。
如净化之类的防御对该攻击基本无效，差分隐私则以牺牲验证准确性来降低投毒成功率。
通过对抗下降框架的理论分析解释了为何梯度对齐能够引导训练朝着最小化对抗损失的方向前进。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。