QUICK REVIEW

[論文レビュー] Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

Zhize Li, Dmitry Kovalev|arXiv (Cornell University)|Feb 26, 2020

Stochastic Gradient Optimization Techniques参考文献 31被引用数 37

ひとこと要約

本論文は、単一マシン問題に対して加速圧縮勾配降下法（ACGD）と、それの分散版であるフェデレーテッド／分散最適化のためのADIANAを提案し、勾配圧縮と加速を組み合わせることで、収束速度と通信効率の向上を実現する。

ABSTRACT

Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular. While in other contexts the best performing gradient-type methods invariably rely on some form of acceleration/momentum to reduce the number of iterations, there are no methods which combine the benefits of both gradient compression and acceleration. In this paper, we remedy this situation and propose the first accelerated compressed gradient descent (ACGD) methods. In the single machine regime, we prove that ACGD enjoys the rate $O\Big((1+ω)\sqrt{\frac{L}μ}\log \frac{1}ε\Big)$ for $μ$-strongly convex problems and $O\Big((1+ω)\sqrt{\frac{L}ε}\Big)$ for convex problems, respectively, where $ω$ is the compression parameter. Our results improve upon the existing non-accelerated rates $O\Big((1+ω)\frac{L}μ\log \frac{1}ε\Big)$ and $O\Big((1+ω)\frac{L}ε\Big)$, respectively, and recover the optimal rates of accelerated gradient descent as a special case when no compression ($ω=0$) is applied. We further propose a distributed variant of ACGD (called ADIANA) and prove the convergence rate $\widetilde{O}\Big(ω+\sqrt{\frac{L}μ}+\sqrt{\big(\fracω{n}+\sqrt{\fracω{n}}\big)\frac{ωL}μ}\Big)$, where $n$ is the number of devices/workers and $\widetilde{O}$ hides the logarithmic factor $\log \frac{1}ε$. This improves upon the previous best result $\widetilde{O}\Big(ω+ \frac{L}μ+\frac{ωL}{nμ} \Big)$ achieved by the DIANA method of Mishchenko et al. (2019). Finally, we conduct several experiments on real-world datasets which corroborate our theoretical results and confirm the practical superiority of our accelerated methods.

研究の動機と目的

分散/フェデレーテッド最適化における通信ボトルネックを、勾配圧縮と加速を統合して緩和する。
圧縮通信下で加速収束を達成する理論的枠組みとアルゴリズムを開発する。
非加速の圧縮ベースラインと比較した反復回数と通信ラウンドの複雑さの改善を示す分析を提供する。
提案手法の加速化された実用性能を実データセットで示す。

提案手法

無偏性と有界分散を持つランダム圧縮演算子を定義する（Definition 1）。
単一マシンの滑らかな最適化向けの加速圧縮勾配降下法であるACGDを提案する（Algorithm 1）。
圧縮ノイズに対抗する分散最適化のための分散化DIANAバリアントとしての収束率を凸性・強凸性で示し、圧縮がない場合（ω=0）には加速GDを再現する。
分散最適化のための加速DIANA変種であるADIANAを提案し、圧縮ノイズを抑制する分散化と分散減少（variance reduction）を組み合わせる（Algorithm 2）。
DIANAより改善する収束保証を導出し、ωおよびnの別領域での挙動を示し、所望のレートを達成するパラメータ選択を提供する。
複数の圧縮演算子（ランダムスパース化、ランダムディザリング、自然圧縮）を用いた標準データセットでの実験検証を提供する。

実験結果

リサーチクエスチョン

RQ1勾配圧縮を加速と組み合わせて、加速収束率を保持しうるのか？
RQ2単一マシンおよび分散/フェデレーテッド設定における加速圧縮勾配法の反復回数と通信の複雑さはどのようになるのか？
RQ3圧縮ωとデバイス数nの各レジームで、加速圧縮法は非加速圧縮法（例：CGD、DIANA）と比べてどうなるのか？
RQ4実データセットでの実験は、理論的な改善と通信効率の実証を裏づけているのか？

主な発見

ACGDは圧縮下で加速レートを達成する：μ-強凸問題ではO((1+ω)√(L/μ) log(1/ε))、凸問題ではO((1+ω)√(L/ε))。
分散設定では、ADIANAはDIANAより改善したレートを達成：ω≥n領域でO(ω(1+√(L/(nμ))))、ω<n領域でO((ω+√(L/μ)+√(√(ω/n))·ωL/μ))、いずれもlog(1/ε)まで。
ω ≤ min{n^{1/3}, √(L/μ)} のとき、ADIANAは通信ラウンドで非圧縮加速勾配降下法に匹敵し、収束特性を崩さずに圧縮を可能にする。
実データを用いた三つの圧縮方式の実験で、ADIANAはDIANAや非圧縮ベースラインよりも多くの場合、収束が速く、通信ビット数が少なくなる。
圏内の圧縮器全体で、ランダムディザリングと自然圧縮を用いたADIANAは、DIANAおよびDCGDよりも顕著な通信効率の向上を示す。
理論的改善とフェデレーテッド/分散最適化における加速圧縮法の実用性を実証する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。