QUICK REVIEW

[論文レビュー] Natural Compression for Distributed Deep Learning

Samuel Horváth, Chen-Yu Ho|arXiv (Cornell University)|May 27, 2019

Stochastic Gradient Optimization Techniques参考文献 44被引用数 69

ひとこと要約

論文は natural compression C_nat を導入し、各更新エントリを最も近い power of two にランダム丸めすることで、収束にほとんど影響を与えないまま通信を大幅に削減し、より攻撃的な圧縮のために natural dithering を拡張し、標準 dithering を超える指数的改善を実現します。

ABSTRACT

Modern deep learning models are often trained in parallel over a collection of distributed machines to reduce training time. In such settings, communication of model updates among machines becomes a significant performance bottleneck and various lossy update compression techniques have been proposed to alleviate this problem. In this work, we introduce a new, simple yet theoretically and practically effective compression technique: natural compression (NC). Our technique is applied individually to all entries of the to-be-compressed update vector and works by randomized rounding to the nearest (negative or positive) power of two, which can be computed in a "natural" way by ignoring the mantissa. We show that compared to no compression, NC increases the second moment of the compressed vector by not more than the tiny factor $\frac{9}{8}$, which means that the effect of NC on the convergence speed of popular training algorithms, such as distributed SGD, is negligible. However, the communications savings enabled by NC are substantial, leading to $3$-$4 imes$ improvement in overall theoretical running time. For applications requiring more aggressive compression, we generalize NC to natural dithering, which we prove is exponentially better than the common random dithering technique. Our compression operators can be used on their own or in combination with existing operators for a more aggressive combined effect and offer new state-of-the-art both in theory and practice.

研究の動機と目的

データパラレルな分散ディープラーニングにおける通信ボトルネックを動機づけ、対処する。
公称的に低分散を保証するシンプルな圧縮演算子を提案する。
圧縮が収束の遅延をほとんど生じさせずに通信量を大幅に削減することを示す。
より攻撃的な圧縮のための natural dithering を導入し、その理論的利点を分析する。
実用的なパフォーマンス向上と既存の圧縮手法との互換性を示す。

提案手法

各実数値更新エントリを unbiased な丸めを用いてランダムに power-of-two に写像する natural compression C_nat を定義・実装する。
C_nat が二次モーメントが有界な unbiased なクラス B(1/8) に属することを証明し、収束への影響を無視できると結論づける（定理 2.3）。
IEEE 754 フォーマットで符号と指数ビットのみをエンコードすることにより natural compression が通信を削減する方法を示す（float32 で 3.56x、float64 で 5.82x のビット削減）。
標準 dithering に対する指数的改善として natural dithering D_nat^{p,s} を導入し、その分散と圧縮特性を証明する（定理 3.2、3.3）。
マスターとワーカーが B(ω) での圧縮を用いて高速化を達成する分散 SGD の双方向圧縮フレームワークをアルゴリズム1として開発する。
従来の圧縮演算子との組合せ規則を介して互換性を示す（定理 2.5）。
訓練時間の削減とスケーラビリティを検証する概念実装系と実験を提供する（ResNet110、CIFAR-10 の AlexNet、ImageNet の結果）。

実験結果

リサーチクエスチョン

RQ1自然圧縮は更新ベクトルの二次モーメントをどの程度増加させ、収束に意味のある影響を与えるか。
RQ2C_nat と natural dithering を用いた双方向圧縮は、精度を維持しつつ分散 SGD で実用的なスピードアップをもたらすか。
RQ3自然圧縮と既存の圧縮技術を組み合わせた場合の理論的保証と実践的利点は何か。
RQ4自然 dithering を固定通信予算の下で標準 dithering と比較した場合、分散と効率性はどうなるか。

主な発見

C_nat は更新の二次モーメントを最大で 9/8 倍にしか増加させず、SGD ベースの手法での収束への影響はほとんど無視できる。
C_nat は片側/両方向の圧縮で各反復の通信を 3.2×〜3.6×削減する。
Natural dithering D_nat^{p,s} は同じ分散レベルで標準 dithering より指数的に優れている。
sparsification や他の演算子と組み合わせると自然圧縮は標準的なアプローチよりも大きな総合的なスピードアップを生む（表1の議論参照）。
実証的な結果は、訓練時間の大幅な削減（例: CIFAR-10 の ResNet110 で約 26%、AlexNet で約 66%）を示し、最終精度の損失なしで ImageNet のような大規模モデルでもスケーラビリティを示す。
提案された演算子は SwitchML 風のネットワーク内集約と互換性があり、B(ω) の広範な圧縮演算子ファミリをサポートする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。