QUICK REVIEW

[论文解读] Communication-Efficient Learning of Deep Networks from Decentralized Data

H. Brendan McMahan, Eider Moore|arXiv (Cornell University)|Feb 17, 2016

Privacy-Preserving Technologies in Data被引用 5,595

一句话总结

介绍联邦平均（FedAvg），一种在去中心化的移动设备数据上通过本地计算和服务器端模型聚合来训练深度网络的实用方法，在处理非独立同分布和不平衡数据的同时，显著减少通信轮次。

ABSTRACT

Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent.

研究动机与目标

激发在移动设备上保留数据以保护隐私并减少中心数据收集的深度网络训练研究。
将 FederatedAveraging (FedAvg) 算法定义并证明其作为一种实用的联邦优化方法的合理性。
在非 IID 与不平衡数据分布下，实证评估 FedAvg 在多种模型架构和数据集上的表现。
证明与标准同步 SGD 相比，FedAvg 将通信轮次大幅降低数量级，同时保持或提高准确性。

提出的方法

通过在客户端将本地 SGD 更新与服务器端加权模型聚合步结合来提出 FederatedAveraging (FedAvg)。
用三个参数来表征 FedAvg：C（每轮参与的客户端比例）、E（每轮的本地训练轮数）、B（本地小批量大小）。
证明 FedAvg 通过允许每轮进行多次本地更新（E>1）和更小的本地小批量（B<∞）来扩展超出 FedSGD 的泛化能力。
分析一个具有固定客户端集、每轮随机选择客户端的同步轮次型联邦优化。
利用简单、可扩展的更新方法，尊重数据本地性并降低通信负担。
提供 ClientUpdate 和 FederatedAveraging 步骤的伪代码，以实现实际应用。

实验结果

研究问题

RQ1在大量客户端的用户设备上仍保留数据时，如何有效地训练深度网络？
RQ2本地计算与模型聚合（FedAvg）是否能在显著减少通信轮次的情况下达到与集中式或完全同步的 SGD 相当的准确性？
RQ3在 FedAvg 下，跨客户端的非 IID 与不平衡数据分布如何影响收敛性和最终模型性能？
RQ4客户端参与率（C）、本地计算（E、B）与整体通信效率之间存在哪些权衡？

主要发现

与 FedSGD 相比，FedAvg 在通信轮次上显著减少，同时在 MNIST、CIFAR-10 以及 Shakespeare LSTM 任务上保持或提升准确性。
增加每个客户端的本地计算（更大的 E 或更小的 B）可显著降低通信轮次；在数据 IID 或中等非 IID 时收益最大。
FedAvg 模型在轮次上收敛更快，且在聚合后甚至可能超过本地训练的模型，这表明具有类似 dropout 的正则化效应。
该方法对高度非 IID 和不平衡的数据分区仍然具有鲁棒性，包括一个拥有超过 50 万客户端的大规模语言建模任务。
在 CIFAR-10 实验中，FedAvg 在比 SGD 基线少得多的轮次内达到目标精度（例如，对于某些目标，分别达到 64.3 倍和 49.2 倍的加速）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。