QUICK REVIEW

[论文解读] On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping

Sanghyun Hong, Varun Chandrasekaran|arXiv (Cornell University)|Feb 26, 2020

Adversarial Robustness in Machine Learning参考文献 45被引用 70

一句话总结

本文提出将梯度塑形作为一种对攻击无关的数据中毒防御，通过对梯度幅值进行界限化并对齐梯度方向来实现，并在若干模型和任务中评估 DP-SGD 作为一种实用的梯度塑形工具。

ABSTRACT

Machine learning algorithms are vulnerable to data poisoning attacks. Prior taxonomies that focus on specific scenarios, e.g., indiscriminate or targeted, have enabled defenses for the corresponding subset of known attacks. Yet, this introduces an inevitable arms race between adversaries and defenders. In this work, we study the feasibility of an attack-agnostic defense relying on artifacts that are common to all poisoning attacks. Specifically, we focus on a common element between all attacks: they modify gradients computed to train the model. We identify two main artifacts of gradients computed in the presence of poison: (1) their $\ell_2$ norms have significantly higher magnitudes than those of clean gradients, and (2) their orientation differs from clean gradients. Based on these observations, we propose the prerequisite for a generic poisoning defense: it must bound gradient magnitudes and minimize differences in orientation. We call this gradient shaping. As an exemplar tool to evaluate the feasibility of gradient shaping, we use differentially private stochastic gradient descent (DP-SGD), which clips and perturbs individual gradients during training to obtain privacy guarantees. We find that DP-SGD, even in configurations that do not result in meaningful privacy guarantees, increases the model's robustness to indiscriminate attacks. It also mitigates worst-case targeted attacks and increases the adversary's cost in multi-poison scenarios. The only attack we find DP-SGD to be ineffective against is a strong, yet unrealistic, indiscriminate attack. Our results suggest that, while we currently lack a generic poisoning defense, gradient shaping is a promising direction for future research.

研究动机与目标

通过寻求攻击无关的防御来挑战对仅针对特定攻击的防御的依赖，以应对数据中毒。
识别在无差别与有针对性攻击中被污染数据所具有的常见梯度级别特征。
提出将梯度塑形作为一种防御原则，限定梯度幅值并对齐梯度方向，以减轻中毒影响。

提出的方法

在训练过程中分析梯度，比较在不同中毒情景下污染样本与干净样本的幅值和方向。
使用特征碰撞和特征插入来构造污染样本，以研究它们对梯度的影响。
使用幅值比和Poison与Clean梯度的余弦相似度来评估梯度层级差异。
以差分隐私随机梯度下降（DP-SGD）作为一种实用工具来实现梯度塑形。
评估 DP-SGD 对多种模型和任务的无差别与有针对性中毒攻击的有效性。
讨论将梯度塑形作为通用防御的局限性与潜力。

实验结果

研究问题

RQ1在各种中毒情景中，被污染的梯度是否相对于干净梯度表现出一致的更高幅值和不同的方向？
RQ2通过 DP-SGD 实现的梯度塑形是否能够减少梯度层级差异并在不依赖清洗的情况下提升对中毒的鲁棒性？
RQ3在不同模型类型和数据集上，梯度塑形对无差别与有针对性中毒攻击的有效性如何？

主要发现

被污染的梯度通常具有更高的幅值和不同的方向，且随着污染强度的增加差异扩大。
梯度塑形旨在减少幅值差异和方向差异，以限制污染对更新的影响。
DP-SGD 可以在隐私保障较弱时提高对无差别攻击的鲁棒性并缓解有针对性攻击。
通过 DP 优化器实现的梯度塑形在针对强力、且不现实的无差别攻击时可能无效，凸显了该方法的局限性。
在三种模型与数据集上，DP-SGD 在多中毒设置下提供了鲁棒性改进并提高了攻击者成本。
研究将梯度塑形识别为一个有前景的方向，需进一步研究以发展通用防御。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。