Skip to main content
QUICK REVIEW

[논문 리뷰] SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

Sai Praneeth Karimireddy|arXiv (Cornell University)|2019. 10. 14.
Privacy-Preserving Technologies in Data참고 문헌 74인용 수 735
한 줄 요약

SCAFFOLD는 FedAvg에서의 클라이언트 드리프트를 감소시키기 위해 제어변수를 도입하여 연합학습에서 더 빠른 수렴과 데이터 이질성 및 클라이언트 샘플링에 대한 강인성을 달성한다. 이는 수렴에 필요한 라운드에서 SGD와 대등하거나 이를 상회하며, 클라이언트 간의 유사성을 활용해 학습 속도를 추가로 높인다.

ABSTRACT

Federated Averaging (FedAvg) has emerged as the algorithm of choice for federated learning due to its simplicity and low communication cost. However, in spite of recent research efforts, its performance is not fully understood. We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence. As a solution, we propose a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the `client-drift' in its local updates. We prove that SCAFFOLD requires significantly fewer communication rounds and is not affected by data heterogeneity or client sampling. Further, we show that (for quadratics) SCAFFOLD can take advantage of similarity in the client's data yielding even faster convergence. The latter is the first result to quantify the usefulness of local-steps in distributed optimization.

연구 동기 및 목표

  • Motivate and analyze the inefficiency of FedAvg under heterogeneous (non-iid) client data and sampling.
  • Propose SCAFFOLD to correct client-drift via control variates.
  • Establish convergence guarantees showing data heterogeneity robustness and potential gains from client similarity.
  • Compare theoretically and empirically with FedAvg, SGD, and FedProx across convex and non-convex settings.

제안 방법

  • Define server and client control variates c and ci to correct local updates.
  • Develop local update rule yi ← yi − ηl ( gi( yi ) − ci + c ).
  • Propose two options for updating ci (either compute g i(x) or use a drift-corrected update).
  • Aggregate updates to update server model x and server control variate c with ηg and sampling S.
  • Provide theoretical convergence guarantees showing免 heterogeneity-robustness and connections to variance reduction methods (SAGA, etc.).
  • Demonstrate that when local steps are used, similarity (delta) among clients can improve convergence for quadratic functions.]

실험 결과

연구 질문

  • RQ1Can FedAvg’s performance under data heterogeneity be theoretically bounded more tightly than earlier results?
  • RQ2Does introducing client- and server-side control variates mitigate client-drift and reduce communication rounds?
  • RQ3How does SCAFFOLD compare to SGD, FedAvg, and FedProx across strongly convex, convex, and non-convex objectives under various client similarity and sampling conditions?
  • RQ4To what extent can similarity among clients ( Hessian dissimilarity delta ) improve convergence when using local steps?

주요 결과

  • FedAvg suffers from client-drift under heterogeneous data, slowing convergence even with full gradients and full participation.
  • SCAFFOLD converges at least as fast as SGD and is robust to client sampling, reducing the impact of data heterogeneity.
  • For strongly convex objectives, SCAFFOLD achieves rates that match SGD with an appropriate setting and can be faster when clients are similar.
  • SCAFFOLD’s benefits extend to non-convex and general convex cases, with convergence bounds that incorporate gradient/Hessian dissimilarities and sampling.
  • When clients are highly similar (low delta), SCAFFOLD can outperform large-batch SGD, leveraging local steps effectively.
  • Empirical results on simulated data and EMNIST show SCAFFOLD consistently outperforms SGD, FedAvg, and FedProx in terms of communication rounds and accuracy.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.