QUICK REVIEW

[论文解读] Stronger Data Poisoning Attacks Break Data Sanitization Defenses

Pang Wei Koh, Jacob Steinhardt|arXiv (Cornell University)|Nov 2, 2018

Adversarial Robustness in Machine Learning被引用 99

一句话总结

本文提出三种协同数据污染攻击，能够规避常见的数据净化防御并仅用3%的污染数据显著降低测试准确率。研究表明这些攻击在基于最近邻、训练损失、SVD和质心距离的异常检测器上仍然有效。

ABSTRACT

Machine learning models trained on data from the outside world can be corrupted by data poisoning attacks that inject malicious points into the models' training sets. A common defense against these attacks is data sanitization: first filter out anomalous training points before training the model. In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition. By adding just 3% poisoned data, our attacks successfully increase test error on the Enron spam detection dataset from 3% to 24% and on the IMDB sentiment classification dataset from 12% to 29%. In contrast, existing attacks which do not explicitly account for these data sanitization defenses are defeated by them. Our attacks are based on two ideas: (i) we coordinate our attacks to place poisoned points near one another, and (ii) we formulate each attack as a constrained optimization problem, with constraints designed to ensure that the poisoned points evade detection. As this optimization involves solving an expensive bilevel problem, our three attacks correspond to different ways of approximating this problem, based on influence functions; minimax duality; and the Karush-Kuhn-Tucker (KKT) conditions. Our results underscore the need to develop more robust defenses against data poisoning attacks.

研究动机与目标

在防守方使用数据净化来筛选异常训练点时，推动并形式化数据污染风险。
表明协同污染可以规避多种异常检测器并降低模型性能。
提出三种攻击框架，利用集中、约束优化和诱饵参数来绕过防御。
在现实防护假设下，在真实数据集上演示测试误差的显著增加。

提出的方法

将攻击表述为带有约束的优化问题，其中被污染点必须能够绕过防御者的异常检测。
将被污染点集中在少数位置以击败对敏感异常检测的检测，同时保持有效性。
开发三种攻击变体——Influence、KKT和Minimax——以近似进行污染所需的双层优化。
使用诱饵参数将攻击者优化与防守者学习的模型解耦，从而实现高效的攻击计算。
提供一种随机化四舍五入的方法以处理整数取值的输入域并确保攻击集中。
提供迭代优化以同时精炼被污染集合和异常检测器参数。

实验结果

研究问题

RQ1当攻击者协调多点时，数据净化防御能否可靠地检测并舍弃被污染的数据？
RQ2哪些攻击策略能在对k-NN、L2、slab、基于损失的和SVD防御等多样异常检测器进行规避？
RQ3在防御约束下，协同污染攻击在提高标准数据集的测试误差方面有多有效？
RQ4如集中性和诱饵参数优化等攻击技术是否可在二分类和多分类分类器的凸损下泛化？
RQ5哪些计算策略使双层污染优化在真实数据集上变得可行？

主要发现

在仅3%的污染数据情况下，即使有净化，攻击也能将Enron垃圾邮件的测试误差从3%提高到24%。
在IMDB情感数据集上，污染数据仅3%条件下，攻击也能将测试误差从12%提升到29%，即使存在净化。
集中化的污染点通过聚集在少数位置来规避高度敏感的异常检测器。
某些二分类的SVM或逻辑回归只需两个被污染点即可在凸损下实现有效攻击。
三种攻击形式（Influence、KKT、Minimax）在计算效率和防御规避之间取得平衡。
正则化反而可能通过降低对小被污染子集的拟合而增加防御者的脆弱性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。