QUICK REVIEW

[论文解读] How To Backdoor Federated Learning

Eugene Bagdasaryan, Andreas Veit|arXiv (Cornell University)|Jul 2, 2018

Privacy-Preserving Technologies in Data参考文献 73被引用 699

一句话总结

本论文表明联邦学习容易受到通过模型替换的模型投毒攻击的影响，从而实现语义后门，这些后门可以持续存在且效果优于数据投毒，即使在安全聚合下亦是如此。

ABSTRACT

Federated learning enables thousands of participants to construct a deep learning model without sharing their private training data with each other. For example, multiple smartphones can jointly train a next-word predictor for keyboards without revealing what individual users type. We demonstrate that any participant in federated learning can introduce hidden backdoor functionality into the joint global model, e.g., to ensure that an image classifier assigns an attacker-chosen label to images with certain features, or that a word predictor completes certain sentences with an attacker-chosen word. We design and evaluate a new model-poisoning methodology based on model replacement. An attacker selected in a single round of federated learning can cause the global model to immediately reach 100% accuracy on the backdoor task. We evaluate the attack under different assumptions for the standard federated-learning tasks and show that it greatly outperforms data poisoning. Our generic constrain-and-scale technique also evades anomaly detection-based defenses by incorporating the evasion into the attacker's loss function during training.

研究动机与目标

在联邦学习中动机并形式化模型投毒威胁。
证明恶意参与者在不影响主任务准确性的前提下，可以用后门模型替换全局模型。
在图像分类和单词预测任务中展示语义后门。
在安全聚合和异常检测防御下评估攻击有效性。

提出的方法

界定联邦学习中的攻击者能力，包括数据、训练过程和模型提交控制。
引入模型替换作为一种攻击，用以把全局模型替换为带有后门的模型。
开发并评估 constrain-and-scale 和 train-and-scale 技术，以规避异常检测。
提出并分析一种两任务学习视角，以在攻击者回合后维持后门持续存在。
在 CIFAR-10 图像分类和 Reddit 单词预测上进行实验，以演示语义后门。

实验结果

研究问题

RQ1单个或少数恶意参与者是否可以在不降低主任务准确性的前提下，在联邦模型中引入后门？
RQ2与传统的数据投毒相比，模型替换在联邦学习中的效果有多大？
RQ3攻击者在安全聚合下能否规避异常检测器，以及他们如何在多个回合中扩展后门的持久性？
RQ4在联邦模型中可以嵌入哪些形式的后门（语义型 vs 像素模式），它们在实际中的表现如何？

主要发现

一次性攻击就能在攻击者选择的任务上获得 100% 的后门准确率。
攻击者控制不到 1% 的参与者也能在不损害主任务准确性的前提下防止后门被解除学习。
在单词预测任务中，模型替换的效果远超数据投毒（例如，拥有 80,000 名参与者时，8 名恶意参与者即可实现 50% 的后门准确率）。
安全聚合和异常检测器并不能阻止该攻击；攻击者可以通过 constrain-and-scale 或 train-and-scale 绕过防御。
该攻击在图像分类的语义后门和单词预测的后门中均有效，并且在非独立同分布数据条件下也有效。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。