QUICK REVIEW

[论文解读] Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

Hongyi Wang, Kartik K. Sreenivasan|arXiv (Cornell University)|Jul 9, 2020

Adversarial Robustness in Machine Learning参考文献 90被引用 110

一句话总结

本文展示了边缘情况（尾部）后门攻击可以被注入到联邦学习模型中，且难以检测，并在多种防御下持续存在，方法是通过基于 PGD 的数据污染或模型污染策略。

ABSTRACT

Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. The goal of a backdoor is to corrupt the performance of the trained model on specific sub-tasks (e.g., by classifying green cars as frogs). A range of FL backdoor attacks have been introduced in the literature, but also methods to defend against them, and it is currently an open question whether FL systems can be tailored to be robust against backdoors. In this work, we provide evidence to the contrary. We first establish that, in the general case, robustness to backdoors implies model robustness to adversarial examples, a major open problem in itself. Furthermore, detecting the presence of a backdoor in a FL model is unlikely assuming first order oracles or polynomial time. We couple our theoretical results with a new family of backdoor attacks, which we refer to as edge-case backdoors. An edge-case backdoor forces a model to misclassify on seemingly easy inputs that are however unlikely to be part of the training, or test data, i.e., they live on the tail of the input distribution. We explain how these edge-case backdoors can lead to unsavory failures and may have serious repercussions on fairness, and exhibit that with careful tuning at the side of the adversary, one can insert them across a range of machine learning tasks (e.g., image classification, OCR, text prediction, sentiment analysis).

研究动机与目标

激发并形式化联邦学习（FL）中的后门威胁及其防御难点。
引入针对训练数据中通常不出现的尾部输入的边缘情况后门攻击。
开发能够在常见防御下存活的攻击策略（数据污染、基于 PGD 的攻击和模型替换）。
理论化后门检测的难度及其与对抗鲁棒性的关系。
通过实验证明边缘情况攻击在各种任务和数据集上都可能有效。

提出的方法

将 p-edge-case 例子定义为来自输入分布的低概率区域的尾部输入。
提出三种攻击策略：黑箱数据污染、具有投影以保持在防御范畴内的基于 PGD 的训练，以及带模型替换的 PGD。
在一组客户端的子集上使用 Federated Averaging (FedAvg) ，并采用不同的攻击模式（固定频率和固定池）。
通过投影梯度下降训练攻击者，使更新保持在范数球内以规避防御。
通过对倒数第二层激活值拟合高斯混合模型来识别边缘情况，从而构造边缘数据。
将边缘情况后门应用于多项任务（图像分类、光学字符识别、情感分析、下一个单词预测）以及防御措施（范数裁剪、Krum、Multi-Krum、RFA、DP）。

实验结果

研究问题

RQ1是否可以在 FL 模型中构建针对边缘案例输入的后门，并且不被常见防御检测到？
RQ2神经网络中对抗鲁棒性与后门鲁棒性之间存在哪些理论关系？
RQ3边缘情况后门是否可跨任务与防御迁移，防御机制如何影响公平性？
RQ4用于生成能实现持久后门的边缘案例数据集的实际数据生成策略有哪些？
RQ5在数据污染和模型污染攻击范式下，边缘情况后门的有效性如何？

主要发现

联邦学习中的后门是可实现且难以检测的；它们的存在与对抗鲁棒性相关，且难以认证。
边缘情况后门在包括 DP、范数裁剪以及诸如 Krum 和 Multi-Krum 的鲁棒聚合器在内的多种防御下也能持续存在。
当 0.5-1% 的边缘用户为对手时，攻击变得有效，并且在无害数据上可以维持性能，同时触发边缘案例的误分类。
理论结果表明如果模型存在对抗样本，在温和条件下存在后门，并且检测后门是 NP-hard。
边缘情况后门可能通过不成比例地影响劣势输入或群体，导致与公平性相关的失败。
构建强调边缘案例样本的数据集可以实现成功的后门注入，而在普通数据上并无明显偏离。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。