QUICK REVIEW

[论文解读] FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers

Zheng Chai, Yujing Chen|arXiv (Cornell University)|Oct 12, 2020

Privacy-Preserving Technologies in Data参考文献 48被引用 28

一句话总结

FedAT 是一种新颖的联邦学习系统，结合了同步的同层内训练与异步的跨层训练，以缓解慢速客户端（stragglers）和通信瓶颈问题。通过采用考虑慢速客户端的加权聚合机制以及基于折线编码（polyline-encoding）的压缩技术，FedAT 在非独立同分布（non-i.i.d.）数据和异构客户端设置下，相比最先进方法，预测准确率最高提升 21.09%，通信开销降低 8.5 倍。

ABSTRACT

Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem, where the clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck, where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based synchronous mechanisms to tackle the straggler problem. However, the asynchronous methods can easily create a network communication bottleneck, while tiering may introduce biases as tiering favors faster tiers with shorter response latencies. To address these issues, we present FedAT, a novel Federated learning method with Asynchronous Tiers under Non-i.i.d. data. FedAT synergistically combines synchronous intra-tier training and asynchronous cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-aware, weighted aggregation heuristic to steer and balance the training for further accuracy improvement. FedAT compresses the uplink and downlink communications using an efficient, polyline-encoding-based compression algorithm, therefore minimizing the communication cost. Results show that FedAT improves the prediction performance by up to 21.09%, and reduces the communication cost by up to 8.5x, compared to state-of-the-art FL methods.

研究动机与目标

解决由客户端资源与数据异构性引起的联邦学习中的慢速客户端问题。
克服在高客户端数量下异步联邦学习中服务器过载导致的通信瓶颈。
在非独立同分布（non-i.i.d.）数据设置下，平衡模型收敛速度、准确率与通信效率。
设计一种即使在部分客户端参与或各层级客户端分布不均时仍能保持高性能的系统。
通过高效的压缩与智能聚合，最大限度降低通信开销，同时不牺牲模型准确率。

提出的方法

提出一种分层架构，根据客户端的计算与网络能力将客户端分组，以管理慢速客户端问题。
在每一层级内采用同步训练，确保快速客户端能稳定、协调地更新模型。
通过异步的跨层级通信，使更快的层级可在不等待较慢客户端的情况下贡献更新。
应用一种考虑慢速客户端的加权聚合启发式方法，根据更新质量与延迟，为更可靠或更快的客户端分配更高的影响权重。
实现一种基于折线编码（polyline-encoding）的压缩算法，以减少上行与下行通信开销。
理论分析证明，在所提出的框架下，凸与非凸损失函数均具有收敛性保证。

实验结果

研究问题

RQ1在非独立同分布（non-i.i.d.）数据下，混合同步-异步训练策略是否能有效缓解联邦学习中的慢速客户端效应？
RQ2与纯同步或异步联邦学习相比，采用分层与加权聚合的机制在模型准确率与收敛速度方面有何提升？
RQ3通过高效压缩技术，通信开销最多可降低多少，同时不降低模型性能？
RQ4该系统在客户端参与率变化及层级间客户端分布不均的情况下，其鲁棒性如何？
RQ5在极端客户端掉线或部分参与的情况下，所提出方法是否仍能保持高性能？

主要发现

FedAT 在 CIFAR-10 与 FEMNIST 数据集上相比最先进联邦学习方法，预测准确率最高提升 21.09%。
通过基于折线编码（polyline-encoding）的模型更新压缩，通信开销降低最多 8.5 倍。
即使每轮仅有 100 个客户端中的 2 个参与，FedAT 在 CIFAR-10 上仍比 FedAvg 高出 14.47% 的准确率。
FedAT 在所有测试配置下（包括均匀、慢速、中速与快速层级分布）均保持高性能，对最终模型准确率影响极小。
由于采用异步跨层级更新，FedAT 的收敛速度优于 FedAvg 与 TiFL，尤其在部分客户端参与时表现更优。
理论分析证实，在 FedAT 框架下，凸与非凸损失函数均具有收敛性保证。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。