QUICK REVIEW

[论文解读] Mitigating Byzantine Attacks in Federated Learning.

Saurav Prakash, Amir Salman Avestimehr|arXiv (Cornell University)|Oct 15, 2020

Privacy-Preserving Technologies in Data参考文献 24被引用 28

一句话总结

DiverseFL 提出了一种新颖的拜占庭容错联邦学习框架，通过使用每个客户端在最小数据样本上计算的引导梯度，解决了非独立同分布（non-IID）数据、可变拜占庭故障以及非凸优化问题。服务器通过客户端特定的梯度比较来标记拜占庭客户端，并仅使用未被标记的客户端更新全局模型，在基准测试中实现了接近 Oracle SGD 的性能。

ABSTRACT

Prior solutions for mitigating Byzantine failures in federated learning, such as element-wise median of the stochastic gradient descent (SGD) based updates from the clients, tend to leverage the similarity of updates from the non-Byzantine clients. However, when data is non-IID, as is typical in mobile networks, the updates received from non-Byzantine clients are quite diverse, resulting in poor convergence performance of such approaches. On the other hand, current algorithms that address heterogeneous data distribution across clients are limited in scope and do not perform well when there is variability in the number and identities of the Byzantine clients, or when general non-convex loss functions are considered. We propose `DiverseFL' that jointly addresses three key challenges of Byzantine resilient federated learning -- (i) non-IID data distribution across clients, (ii) variable Byzantine fault model, and (iii) generalization to non-convex and non-smooth optimization. DiverseFL leverages computing capability of the federated learning server that for each iteration, computes a `guiding' gradient for each client over a tiny sample of data received only once from the client before start of the training. The server uses `per client' criteria for flagging Byzantine clients, by comparing the corresponding guiding gradient with the client's gradient update. The server then updates the model using the gradients received from the non-flagged clients. As we demonstrate in our experiments with benchmark datasets and popular Byzantine attacks, our proposed approach performs better than the prior algorithms, almost matching the performance of the `Oracle SGD', where the server knows the identities of the Byzantine clients.

研究动机与目标

解决在客户端间数据非独立同分布时，传统基于中位数的聚合方法失效的拜占庭容错联邦学习挑战。
克服现有方法在可变拜占庭故障模型或一般非凸损失函数下失效的局限性。
在拜占庭客户端数量和身份不可预测的真实联邦学习环境中，实现稳健的模型训练。
在无需事先知晓拜占庭客户端或假设诚实客户端之间数据相似性的前提下，提升收敛性和泛化性能。

提出的方法

在训练开始前，服务器使用从每个客户端接收的一小批、一次性数据样本，为每个客户端计算一个“引导”梯度。
对于每个客户端，服务器使用客户端特定的准则，将其实际梯度更新与预先计算的引导梯度进行比较，以检测异常。
梯度显著偏离其引导梯度的客户端将被标记为潜在的拜占庭客户端。
仅使用未被标记（可信）客户端的梯度来更新全局模型，确保聚合的鲁棒性。
该方法设计为兼容一般非凸和非光滑损失函数，扩展了其在凸设置之外的适用性。
该方法利用服务器端计算来提高检测准确性，而无需在训练期间从客户端获取额外通信。

实验结果

研究问题

RQ1在传统中位数方法失效的非独立同分布数据分布下，一种拜占庭容错联邦学习方法是否能保持高性能？
RQ2当拜占庭客户端的数量和身份在不同轮次中变化时，基于客户端的引导梯度机制在检测拜占庭客户端方面的有效性如何？
RQ3服务器端检测机制在多大程度上能实现接近已知拜占庭客户端的 Oracle SGD 的性能？
RQ4所提出的方法在常见于深度学习的非凸和非光滑优化问题中是否具有泛化能力？
RQ5在基准数据集上的真实拜占庭攻击场景下，该方法与先前最先进方法相比表现如何？

主要发现

DiverseFL 在多个基准数据集上实现了几乎与已知拜占庭客户端的 Oracle SGD 相当的收敛性能。
在非独立同分布数据设置下，该方法显著优于以往基于中位数和鲁棒聚合的技术，后者因客户端梯度多样性而性能下降。
即使拜占庭客户端的数量和身份在训练轮次间发生变化，DiverseFL 仍保持鲁棒性和一致的性能。
使用基于客户端的引导梯度实现了无需依赖诚实客户端之间梯度相似性假设的准确拜占庭客户端检测。
该方法在非凸和非光滑损失函数上表现出强大的泛化能力，适用于真实世界的深度学习应用。
在标准联邦学习基准上的实证评估证实，与现有基线相比，DiverseFL 在各种拜占庭攻击下显著减少了模型准确率的下降。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。