QUICK REVIEW

[论文解读] Decentralized Federated Learning: A Segmented Gossip Approach

Chenghao Hu, Jingyan Jiang|arXiv (Cornell University)|Aug 21, 2019

Privacy-Preserving Technologies in Data参考文献 14被引用 135

一句话总结

引入一种基于分段 gossip 的去中心化联邦学习框架（Combo），将模型分成若干段并使用多个对等节点进行聚合，以更好地利用带宽并在保持竞争力的准确率的情况下加速训练。

ABSTRACT

The emerging concern about data privacy and security has motivated the proposal of federated learning, which allows nodes to only synchronize the locally-trained models instead their own original data. Conventional federated learning architecture, inherited from the parameter server design, relies on highly centralized topologies and the assumption of large nodes-to-server bandwidths. However, in real-world federated learning scenarios the network capacities between nodes are highly uniformly distributed and smaller than that in a datacenter. It is of great challenges for conventional federated learning approaches to efficiently utilize network capacities between nodes. In this paper, we propose a model segment level decentralized federated learning to tackle this problem. In particular, we propose a segmented gossip approach, which not only makes full utilization of node-to-node bandwidth, but also has good training convergence. The experimental results show that even the training time can be highly reduced as compared to centralized federated learning.

研究动机与目标

在没有中心化服务器的分布式学习中保障数据隐私。
通过充分利用节点对节点带宽来提升联邦设置中的网络效率。
通过分段 gossip 和模型副本保障收敛性与训练速度。
设计并评估一个原型系统（Combo），用于现实 WAN 类网络中的分段 gossip 聚合。

提出的方法

将全局模型分割为非重叠段并执行分段聚合。
使用基于 gossip 的协议，在每次迭代中每个工作节点从不同对等方提取多个模型段（分段拉取）。
引入模型副本以改进信息传播和收敛（拉取并聚合 R 个混合模型）。
使用分段带权平均按层聚合分段，权重基于本地数据集大小 (P_l 和 |D_j|)。
实现一个原型（Combo），对动态参与者（加入/退出）和同步进行显式处理。

实验结果

研究问题

RQ1当模型分段并从多个对等节点聚合时，模型更新能有效同步吗？
RQ2在带宽受限、地理分布的联邦设置中，分段 gossip 如何影响收敛性和训练时间？
RQ3模型副本（R）和分段（S）对收敛性与通信效率的影响是什么？
RQ4系统如何在无中心协调的情况下处理动态参与者？
RQ5在去中心化联邦学习中，分段 gossip 聚合的理论收敛性有哪些？

主要发现

与集中式 FedAvg 相比，Combo 在训练时间上显著缩短，同时最终准确率接近相同。
增加分段数量 S 通过更好地占满可用带宽来降低同步时间，但带宽耗尽后收益递减。
提高模型副本数量在某个点前会提升每次迭代的准确性和收敛，但到达某一点后收益趋于稳定，且由于开销，训练时间可能增加。
在 S=10 与 R=2 时，Combo 显著优于朴素 gossip，在 20–40 个工作节点的情况下比 FedAvg 的扩展性更好。
模型段不会降低每次迭代的准确性，而分段加速同步。
在提出的收敛性分析中，最终界与梯度发散（δ）和聚合发散（ρ）相关，增加 R 可以将 ρ 降低到接近 All-Reduce 的行为。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。