QUICK REVIEW

[論文レビュー] Collaborative Deep Learning in Fixed Topology Networks

Zhanhong Jiang, Aditya Balu|arXiv (Cornell University)|Jun 23, 2017

Distributed Control Multi-Agent Systems参考文献 17被引用数 77

ひとこと要約

本論文は、固定トポロジー図上での合意ベースの分散SGD（CDSGD）とそのモーメンタム変種（CDMSGD）を提案し、データパラレル性と分散計算を実現し、凸/非凸目的関数に対する収束保証を提供するとともに、CIFAR-10/100で中央集権型SGDおよびFedAvgと比較した実証的検証を行う。

ABSTRACT

There is significant recent interest to parallelize deep learning algorithms in order to handle the enormous growth in data and model sizes. While most advances focus on model parallelization and engaging multiple computing agents via using a central parameter server, aspect of data parallelization along with decentralized computation has not been explored sufficiently. In this context, this paper presents a new consensus-based distributed SGD (CDSGD) (and its momentum variant, CDMSGD) algorithm for collaborative deep learning over fixed topology networks that enables data parallelization as well as decentralized computation. Such a framework can be extremely useful for learning agents with access to only local/private data in a communication constrained environment. We analyze the convergence properties of the proposed algorithm with strongly convex and nonconvex objective functions with fixed and diminishing step sizes using concepts of Lyapunov function construction. We demonstrate the efficacy of our algorithms in comparison with the baseline centralized SGD and the recently proposed federated averaging algorithm (that also enables data parallelism) based on benchmark datasets such as MNIST, CIFAR-10 and CIFAR-100.

研究の動機と目的

各エージェントが局所的なプライベートデータを保持し、通信が固定トポロジーによって制約される中で、スケーラブルな分散深層学習を動機づける。
ネットワーク制約下で分散計算とデータパラレル性を実現するために、CDSGD（およびCDMSGD）を開発する。
Lyapunov関数の構築を用いて、強凸および非凸目的関数に対する収束解析を提供する。
収束速度、精度、一般化を評価するために、中央集権型 SGD および Federated Averaging と比較して検討する。

提案手法

固定無向グラフ上で、二重確率性を持つ相互作用行列 Pi を用いた分散型経験リスク最小化問題を定義する。
Propose CDSGD: x_k+1^j = sum_{l in Nb(j)} pi_jl x_k^l − alpha g_j(x_k^j) for each agent j.
Introduce Lyapunov function V(x, alpha) = (N/n) 1^T F(x) + (1/(2 alpha)) ||x||_{I−Pi}^2 to analyze convergence.
Establish consensus results showing E[||x_k^j − s_k||] ≤ alpha L / (1 − lambda_2(Pi)).
Provide convergence theorems for strongly convex (linear convergence to a neighborhood) and nonconvex (bounded gradient sums) cases under Assumptions 1–3.
Mention extensions to momentum variants (CDMSGD) and diminishing step sizes (supplementary material).

実験結果

リサーチクエスチョン

RQ1固定トポロジーで接続されたエージェント間でデータが分散されている場合、CDSGDは合意を達成し収束できるか。
RQ2CDSGD/CDMSGDの下で、強凸および非凸目的に対して達成可能な収束速度と定常状態の精度はどの程度か。
RQ3ネットワークトポロジー（スペクトルギャップ）が、中央集権型 SGD やFedAvgと比較して、収束、合意、および最終精度にどのように影響するか？
RQ4CDSGD/CDMSGDは、中央集権型のベースラインおよびFedAvgと比較して、汎化（訓練と検証のギャップ）を改善するか？
RQ5固定ステップサイズと減衰ステップサイズは、収束の挙動と実践的な性能にどのように影響するか？

主な発見

CDSGDは有限のステップサイズでエージェント間の合意を達成し、合意誤差は alpha およびグラフのスペクトルギャップ（Pi の lambda_2）によって有界である。
強凸目的に対して、CDSGDは最適解の近傍に線形収束する。近傍は小さなステップサイズで縮小し、スペクトルギャップが大きいほど改善する。
非凸目的に対して、CDSGDは反復にわたる勾配ノルムの和が有界となり、実用的には停留点へ収束することを意味する。
CDMSGDは、分散計算を維持しつつ、定常状態の精度でFedAvgを上回ることができ、十分なエポックを与えると中央集権型 SGDの性能に近づく。
CIFAR-10/100 の実証結果は、CDSGDが中央集権型 SGDまたはFedAvgと同等またはそれ以上の最終精度を達成し、一般化ギャップが小さいことを示す。ネットワークの規模とトポロジーは、合意の安定性と学習ダイナミクスに好影響を与える。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。