QUICK REVIEW

[論文レビュー] FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data

Xinwei Zhang, Mingyi Hong|arXiv (Cornell University)|May 22, 2020

Stochastic Gradient Optimization Techniques参考文献 33被引用数 53

ひとこと要約

FedPD は非 IID データの下で最適な最適化と通信レートを達成する primal-dual フェデレーテッド学習フレームワークで、データの異質性に応じて適応的な通信パターンを持つ。非凸目的にも動作するアルゴリズムを提供し、CTA風FLを正式に分析する。

ABSTRACT

Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model, in which multiple local updates are performed using local data, before sending the local models to the cloud for aggregation. However, these schemes typically require strong assumptions, such as the local data are identically independent distributed (i.i.d), or the size of the local gradients are bounded. In this paper, we first explicitly characterize the behavior of the FedAvg algorithm, and show that without strong and unrealistic assumptions on the problem structure, the algorithm can behave erratically for non-convex problems (e.g., diverge to infinity). Aiming at designing FL algorithms that are provably fast and require as few assumptions as possible, we propose a new algorithm design strategy from the primal-dual optimization perspective. Our strategy yields a family of algorithms that take the same CTA model as existing algorithms, but they can deal with the non-convex objective, achieve the best possible optimization and communication complexity while being able to deal with both the full batch and mini-batch local computation models. Most importantly, the proposed algorithms are {\it communication efficient}, in the sense that the communication pattern can be adaptive to the level of heterogeneity among the local data. To the best of our knowledge, this is the first algorithmic framework for FL that achieves all the above properties.

研究の動機と目的

非IIDデータおよびCTAプロトコル下でのFedAvgの限界を理解する動機付け。
非IID設定において最適な最適化と通信複雑性を達成するフレームワークを開発する。
データの異質性に適応する柔軟なアルゴリズム設計を提供する。
最小仮定 A1–A2 の下で収束結果を確立し、通信を節約できる条件を特徴づける。

提案手法

フェデレーテッドラーニングをコンセンサス変数を持つ制約付き問題として定式化し、拡張ラグランジュを用いる。
FedPD を、通信ラウンド間の局所処理をモデル化するオラクルを備えた primal-dual メタアルゴリズムとして導入する。
具体的な局所オラクルを2つ提供（GD風とSGD風）、およびサンプル複雑性を改善する分散減少版。
非IIDパラメータ delta に基づいて集約頻度 p を適応させ、通信と精度のトレードオフを定量化する。
delta-non-IIDデータと非凸目的の下で最適な通信複雑性を示す収束結果を証明する（定理1）。
FedPD を FedProx および FedDANE に関連づけ、CTAの下での改善とより弱い仮定を強調する。

実験結果

リサーチクエスチョン

RQ1Q1 CTAを前提としたとき、エージェントが全体のシステム性能を達成するための最適な局所更新方向は何か？
RQ2Q2 より洗練された集約は単純な平均化を超えてサンプルや通信の複雑性を改善できるか？
RQ3Q3 通信間に複数の局所更新を行うと通信労力を減らせるか？
RQ4Q4 最小限の問題仮定(A1–A2)の下でCTA型アルゴリズムが達成可能な最高の性能は何か？（A1–A2）

主な発見

CTAベースの局所勾配更新だけでは非凸目的の下で O(1/epsilon) 通信ラウンドを超えられない。
FedPDは非IID設定で最適な最適化と通信複雑性を達成でき、A1–A2の下で収束する。
集約スキップ確率 p は delta-non-IID に適応し、理論と図に示すように線形対数の通信節約をもたらす。
Oracle I（GD/SGD）を用いたFedPDは適応的な通信で収束を達成し、Oracle II（分散減少）はサンプル複雑性を改善する。
データがよりIIDになる（delta -> 0）ほど通信節約は増加し、非IID性が大きくなる（delta が大きい）ほど減少する。
CTAフレームワーク内で、FedPD は FedProx および FedDANE よりもより良い理論的保証とより弱い仮定を提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。