QUICK REVIEW

[论文解读] A Unified Framework for Data Poisoning Attack to Graph-based Semi-supervised Learning

Xuanqing Liu, Si Si|arXiv (Cornell University)|Oct 30, 2019

Adversarial Robustness in Machine Learning参考文献 31被引用 33

一句话总结

论文提出一个统一框架用于图基半监督学习 (G-SSL) 的数据投毒攻击，包含对回归在 L2 约束下和分类在 L0 约束下的专门算法，以及大量实验来展示攻击有效性。

ABSTRACT

In this paper, we proposed a general framework for data poisoning attacks to graph-based semi-supervised learning (G-SSL). In this framework, we first unify different tasks, goals, and constraints into a single formula for data poisoning attack in G-SSL, then we propose two specialized algorithms to efficiently solve two important cases --- poisoning regression tasks under $\\ell_2$-norm constraint and classification tasks under $\\ell_0$-norm constraint. In the former case, we transform it into a non-convex trust region problem and show that our gradient-based algorithm with delicate initialization and update scheme finds the (globally) optimal perturbation. For the latter case, although it is an NP-hard integer programming problem, we propose a probabilistic solver that works much better than the classical greedy method. Lastly, we test our framework on real datasets and evaluate the robustness of G-SSL algorithms. For instance, on the MNIST binary classification problem (50000 training data with 50 labeled), flipping two labeled data is enough to make the model perform like random guess (around 50\\% error).

研究动机与目标

引入一个针对图基半监督学习 (G-SSL) 的数据投毒攻击的一般框架。
在投毒框架中同时处理回归和分类任务。
在不同约束设置下开发高效算法（回归的 L2、分类的 L0）。
探索白盒和信息不完全场景并评估 G-SSL 的鲁棒性。

提出的方法

将投毒建模为对训练标签或特征的扰动，置于统一目标（Eq. 2）
推导标签传播预测的闭式解，并将投毒问题化简为可处理的子问题（回归和分类）
对于回归：使用梯度基求解器求解一个非凸信任域问题，该求解器收敛到全局最小值（Algorithm 1）
对于分类：将其转化为 NP-hard 的离散问题，应用带重参数化和随机梯度的概率（伯努利）松弛（方程 7–10）
通过 ||b1||2 正则化和在概率求解器中进行 top-cmax 选择来实现稀疏性和预算约束
在真实数据集上测试以评估对 RMSE 和误差率的攻击影响。

实验结果

研究问题

RQ1在训练过程中，G-SSL 方法对数据投毒有多脆弱？
RQ2我们能否在基于图的 SSL 下统一且系统地对回归和分类任务进行攻击？
RQ3在 L2 和 L0 类型约束下，哪些算法能高效找到最优或近最优扰动？
RQ4攻击者的知识（完整或不完整）如何影响攻击效果？
RQ5G-SSL 方法对投毒的鲁棒性如何，取决于有标签数据量和核参数？

主要发现

小扰动就能显著降低 G-SSL 性能（例如，在 MNIST 二分类任务中翻转两个有标签点就相当于随机猜测）。
回归投毒方法在 L2 约束下，通过渐增线性时间的梯度方法实现近似全局最优扰动。
分类投毒问题受益于概率求解器，优于贪心基线，尤其当 c_max 增大时。
即使攻击者不知道确切的未标签标签，使用估计标签仍能保持较小影响的损失，投毒仍然有效。
由于带标签节点的信息传播，随着带标签点数量增加，对投毒的鲁棒性提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。