QUICK REVIEW

[論文レビュー] On Biased Compression for Distributed Learning

Aleksandr Beznosikov, Samuel Horváth|arXiv (Cornell University)|Feb 27, 2020

Stochastic Gradient Optimization Techniques参考文献 32被引用数 48

ひとこと要約

この論文は分散学習のための偏り付き勾配圧縮演算子を分析し、誤差フィードバックと共に線形収束を証明し、単ノードとマルチノード設定で偏りあり vs 偏りなしの圧縮機を比較し、3つの偏り付き圧縮器クラスと新しい演算子を導入する。

ABSTRACT

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show superior performance in practice when compared to the much more studied and understood unbiased compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We prove that distributed compressed SGD method, employed with error feedback mechanism, enjoys the ergodic rate $O\left( δL \exp \left[-\frac{μK}{δL} ight] + \frac{(C + δD)}{Kμ} ight)$, where $δ\ge 1$ is a compression parameter which grows when more compression is applied, $L$ and $μ$ are the smoothness and strong convexity constants, $C$ captures stochastic gradient noise ($C=0$ if full gradients are computed on each node) and $D$ captures the variance of the gradients at the optimum ($D=0$ for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose several new biased compressors with promising theoretical guarantees and practical performance.

研究の動機と目的

分散学習における通信削減の手段として、偏り付き圧縮を動機づけ、形式化する。
3つのパラメトリックな偏り付き圧縮器クラスを導入し、それを無偏のものと関連づける。
誤差フィードバックを伴う単一ノードおよび分散設定で、偏り付き勾配法の収束保証を確立する。
様々なデータ分布の下で、偏り付き圧縮器が無偏の同等物を上回る条件を探る。
理論的保証と実践的性能を持つ新しい偏り付き圧縮器を提案する。

提案手法

3つの偏り付き圧縮クラスを定義する：B^1(α,β)、B^2(γ,β)、B^3(δ) と、それらを U(ζ)（無偏）と関連づける。
圧縮器クラス間の同値性とスケーリング特性を証明する（Theorem 6）。
各クラスについて、単一ノード設定での勾配降下法の収束率を導出する（Theorems 17–19, Table 1）。
スケーリングが収束率に与える影響を分析し、データ分布仮定の下で偏り付き vs 無偏の性能を比較する。
誤差フィードバックを用いた分散SGDへ分析を拡張し、さまざまなスケジュールでの遍歴収束率を提供する（Theorem 21, Table 2）。
広範な偏り付きおよび無偏の圧縮器を三クラスに分類して整理する（Table 3。）

実験結果

リサーチクエスチョン

RQ1偏り付き圧縮演算子は、単一ノードおよび分散設定のSGD/勾配法で線形収束を達成できるか？
RQ2勾配エントリの異なる統計分布下で、偏り付き圧縮器は無偏のものとどのように比較されるか？
RQ3標準の滑らかさ/強凸性仮定の下で、偏り付き圧縮器の具体的な収束速度と計算量はどうなるか？
RQ4誤差フィードバックは分散学習において偏り付き圧縮器の安定収束をどのように可能にするか？
RQ5理論的保証と実践的有効性を備えた新しい偏り付き圧縮器を設計できるか？

主な発見

偏り付き圧縮器は、誤差フィードバックと組み合わせると、単一ノードおよび分散設定の両方で線形収束を達成できる。
3つの偏り付き圧縮クラスを定義し、定理が正確な収束率を与える：Table 1は各クラス下のCGDの計算量を要約している。
誤差フィードバックを伴う分散SGDは遍歴収束を達成し、δ、μ、L、およびKに依存する速度で収束する（Table 2）。
等価性の結果は、偏り付きクラスが無偏の圧縮器とどのように関連し、模倣できるかを示し、パラメータ選択とスケーリングを導く（Theorem 6）。
いくつかの新しい偏り付き圧縮器が提案・分類され（Table 3）、理論的保証と実用的な代替案を示す。
分析は、特定の勾配分布の下で、偏り付き圧縮器が無偏のバリエーションより実証的に有利である可能性を示唆している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。