[论文解读] SCAFFOLD: Stochastic Controlled Averaging for Federated Learning
SCAFFOLD 引入 control variates 以降低 FedAvg 中的客户端漂移,在联邦学习中实现更快的收敛并对数据异质性和客户端采样具有鲁棒性。它在收敛轮数方面匹配或超过 SGD,并利用客户端之间的相似性进一步加速训练。
Federated Averaging (FedAvg) has emerged as the algorithm of choice for federated learning due to its simplicity and low communication cost. However, in spite of recent research efforts, its performance is not fully understood. We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence. As a solution, we propose a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the `client-drift' in its local updates. We prove that SCAFFOLD requires significantly fewer communication rounds and is not affected by data heterogeneity or client sampling. Further, we show that (for quadratics) SCAFFOLD can take advantage of similarity in the client's data yielding even faster convergence. The latter is the first result to quantify the usefulness of local-steps in distributed optimization.
研究动机与目标
- Motivate and analyze the inefficiency of FedAvg under heterogeneous (non-iid) client data and sampling.
- Propose SCAFFOLD to correct client-drift via control variates.
- Establish convergence guarantees showing data heterogeneity robustness and potential gains from client similarity.
- Compare theoretically and empirically with FedAvg, SGD, and FedProx across convex and non-convex settings.
提出的方法
- Define server and client control variates c and ci to correct local updates.
- Develop local update rule yi ← yi − ηl ( gi( yi ) − ci + c ).
- Propose two options for updating ci (either compute g i(x) or use a drift-corrected update).
- Aggregate updates to update server model x and server control variate c with ηg and sampling S.
- Provide theoretical convergence guarantees showing免 heterogeneity-robustness and connections to variance reduction methods (SAGA, etc.).
- Demonstrate that when local steps are used, similarity (delta) among clients can improve convergence for quadratic functions.
实验结果
研究问题
- RQ1Can FedAvg’s performance under data heterogeneity be theoretically bounded more tightly than earlier results?
- RQ2Does introducing client- and server-side control variates mitigate client-drift and reduce communication rounds?
- RQ3How does SCAFFOLD compare to SGD, FedAvg, and FedProx across strongly convex, convex, and non-convex objectives under various client similarity and sampling conditions?
- RQ4To what extent can similarity among clients ( Hessian dissimilarity delta ) improve convergence when using local steps?
主要发现
- FedAvg suffers from client-drift under heterogeneous data, slowing convergence even with full gradients and full participation.
- SCAFFOLD converges at least as fast as SGD and is robust to client sampling, reducing the impact of data heterogeneity.
- For strongly convex objectives, SCAFFOLD achieves rates that match SGD with an appropriate setting and can be faster when clients are similar.
- SCAFFOLD’s benefits extend to non-convex and general convex cases, with convergence bounds that incorporate gradient/Hessian dissimilarities and sampling.
- When clients are highly similar (low delta), SCAFFOLD can outperform large-batch SGD, leveraging local steps effectively.
- Empirical results on simulated data and EMNIST show SCAFFOLD consistently outperforms SGD, FedAvg, and FedProx in terms of communication rounds and accuracy.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。