QUICK REVIEW

[論文レビュー] EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

Peter Richtárik, Igor Sokolov|arXiv (Cornell University)|Jun 9, 2021

Stochastic Gradient Optimization Techniques参考文献 37被引用数 40

ひとこと要約

EF21は、分散最適化における偏りを持つ勾配圧縮のためのマルコフベースの誤差フィードバック機構を導入し、標準的な仮定の下でより速い収束と、従来のEF法よりも実証的な性能向上を達成する。

ABSTRACT

Error feedback (EF), also known as error compensation, is an immensely popular convergence stabilization mechanism in the context of distributed training of supervised machine learning models enhanced by the use of contractive communication compression mechanisms, such as Top-$k$. First proposed by Seide et al (2014) as a heuristic, EF resisted any theoretical understanding until recently [Stich et al., 2018, Alistarh et al., 2018]. However, all existing analyses either i) apply to the single node setting only, ii) rely on very strong and often unreasonable assumptions, such global boundedness of the gradients, or iterate-dependent assumptions that cannot be checked a-priori and may not hold in practice, or iii) circumvent these issues via the introduction of additional unbiased compressors, which increase the communication cost. In this work we fix all these deficiencies by proposing and analyzing a new EF mechanism, which we call EF21, which consistently and substantially outperforms EF in practice. Our theoretical analysis relies on standard assumptions only, works in the distributed heterogeneous data setting, and leads to better and more meaningful rates. In particular, we prove that EF21 enjoys a fast $O(1/T)$ convergence rate for smooth nonconvex problems, beating the previous bound of $O(1/T^{2/3})$, which was shown a bounded gradients assumption. We further improve this to a fast linear rate for PL functions, which is the first linear convergence result for an EF-type method not relying on unbiased compressors. Since EF has a large number of applications where it reigns supreme, we believe that our 2021 variant, EF21, can a large impact on the practice of communication efficient distributed learning.

研究の動機と目的

分散・過剰パラメータ化モデルにおける効果的な通信効率の高い最適化の必要性を動機づける。
強い仮定を置かずにヘテログニアスなデータに適した新しい誤差フィードバック機構を開発する。
標準的な滑らかさと下界性の仮定の下で改善された収束速度と、PL条件下での線形収束という理論的保証を確立する。
EF21をEF21+へ拡張し、より良い性能のために各イテレーションで圧縮機の選択を適応的に行う。
合成データと深層学習の実験において、従来のEFより経験的に優れていることを示す。

提案手法

時間とともに性能が向上する推定量を作るためにマルコフ圧縮機を導入し、安定した偏り付き圧縮を可能にする。
各ノードが局所勾配に対して偏りを持つ圧縮機を適用し、圧縮された残差を伝達する分散最適化法としてEF21を定義する。
標準仮定（L-滑らかさと下界性）の下で滑らかな非凸目的関数に対するO(1/T)収束を証明する。
EF21のPL条件下での線形収束を証明する。
元の圧縮機とマルコフ圧縮機のどちらをイテレーションごとに選ぶかをノードごとに決定するハイブリッドとしてEF21+を提案する。
確率的勾配への拡張を提供し、元のEF法との関係を論じる。）

実験結果

リサーチクエスチョン

RQ1無偏倉を必要とせず、標準的な滑らかさと下界性の仮定の下で、EF21は非凸問題に対してより速い収束率を達成できるか。
RQ2追加の無偏値圧縮機に頼らず、PL条件下でEF21は線形収束を示すか。
RQ3実践上、特に分散ヘテロジニアスデータにおいて、EF21は古典的な誤差フィードバック（EF）とどう比較されるか。
RQ4ノードごとに圧縮機を適応的に選択することで、EF21+の実践的な利 gains が得られるか。
RQ5EF21を確率的勾配設定に拡張し、理論的保証を維持できるか。

主な発見

EF21は標準仮定の下で滑らかな非凸問題に対してO(1/T)収束速度を達成する。
EF21はPL条件を満たす関数に対して高速な線形収束を達成する。
EF21+は理論的保証を維持しつつ、ノードごとにより良い圧縮機を選択することで実践的性能を向上させることが多い。
実験結果は、合成データと深層学習のベンチマークでEF21が元のEF法を一貫して大幅に上回り、より大きな学習率を可能にすることを示している。
分析は標準仮定（L-滑らかさと下界）だけを必要とし、勾配の有界性や反復依存の境界には依らない。
EF21とEF21+は、議論した適応によって確率的勾配設定にも拡張される。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。