QUICK REVIEW

[論文レビュー] RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

Liping Li, Wei Xu|arXiv (Cornell University)|Nov 9, 2018

Stochastic Gradient Optimization Techniques参考文献 24被引用数 19

ひとこと要約

本稿では、一部のワーカーが任意の悪意ある更新を送信する可能性があるByzantine攻撃下での分散学習のための、新たなクラスのロバストな確率的部分勾配法であるRSA（Byzantine-Robust Stochastic Aggregation）を提案する。目的関数にℓp-ノルム正則化を組み込むことで、RSAはByzantineワーカーの数に比例する誤差で近似的に最適解に収束し、非i.i.d.データ下でも標準的なSGDと同等の収束速度を維持する。これはi.i.d.仮定を必要とせず、複雑な勾配選択サブルーチンも不要である。

ABSTRACT

In this paper, we propose a class of robust stochastic subgradient methods for distributed learning from heterogeneous datasets at presence of an unknown number of Byzantine workers. The Byzantine workers, during the learning process, may send arbitrary incorrect messages to the master due to data corruptions, communication failures or malicious attacks, and consequently bias the learned model. The key to the proposed methods is a regularization term incorporated with the objective function so as to robustify the learning task and mitigate the negative effects of Byzantine attacks. The resultant subgradient-based algorithms are termed Byzantine-Robust Stochastic Aggregation methods, justifying our acronym RSA used henceforth. In contrast to most of the existing algorithms, RSA does not rely on the assumption that the data are independent and identically distributed (i.i.d.) on the workers, and hence fits for a wider class of applications. Theoretically, we show that: i) RSA converges to a near-optimal solution with the learning error dependent on the number of Byzantine workers; ii) the convergence rate of RSA under Byzantine attacks is the same as that of the stochastic gradient descent method, which is free of Byzantine attacks. Numerically, experiments on real dataset corroborate the competitive performance of RSA and a complexity reduction compared to the state-of-the-art alternatives.

研究の動機と目的

分散機械学習におけるByzantine障害という重要な課題に取り組むこと。ここでは一部のワーカーが任意または破損した更新を送信する可能性がある。
i.i.d.データ仮定に依存しないロバストな学習フレームワークの開発。これは現実のフェデレーテッドラーニング環境ではしばしば成立しない。
未知の数のワーカーがByzantineである場合でも、近似的に最適な解に収束することを保証し、性能の低下が故障ワーカー数に比例するようにすること。
Byzantine攻撃下でも標準SGDと同等の収束速度を達成すること。効率性を保ちつつ耐性を強化する。

提案手法

目的関数にマスターモデルからの逸脱をペナルティとするℓp-ノルム項を組み込んだ正則化された目的関数を導入し、Byzantine更新の影響を効果的に軽減する。
RSAは、ロバスト化のための正則化項を適用した後に勾配を集約する確率的部分勾配降下法を採用する。
正則化項は、マスターモデルとワーカーモデル間のℓp-ノルム距離の部分微分から導出され、任意のByzantine行動に対しても耐性を発揮する。
幾何的中央値やKrumのような高コストな勾配選択手順を避ける計算効率の高いアルゴリズムとして設計されている。
収束解析は、期待される部分勾配ノルムの上限をとることに依存し、目的関数の強い凸性とLipschitz連続性の仮定を用いる。
ℓ1、ℓ2など異なるℓp-ノルムに基づく複数のバリエーションへの一般化が可能であり、それぞれが特定の耐性とスパarsityのトレードオフに最適化されている。

実験結果

リサーチクエスチョン

RQ1i.i.d.データ仮定を仮定しない条件下で、分散学習アルゴリズムがByzantine攻撃下でも収束性と性能を維持できるか？
RQ2故障ワーカーの数が未知で、その更新が任意に破損している状況で、どのようにして分散学習の耐性を実現できるか？
RQ3Byzantine攻撃が存在しない状況で、ロバスト学習の収束速度が標準SGDと一致できるか？
RQ4非i.i.d.設定下で、学習誤差がByzantineワーカーの数にどのように依存するか？

主な発見

RSAは、Byzantineワーカー数に比例する項で抑えられた誤差のもとで、近似的に最適解に収束する。
非Byzantine条件下では、標準的な確率的勾配降下法（SGD）と同等の収束速度を達成する。効率性が保たれる。
i.i.d.データ仮定を必要としないため、異種のデータ分布を持つ現実のフェデレーテッドラーニングに適用可能である。
実データセットを用いた数値実験では、最先端のロバスト手法と比較して、同等以上の性能を示し、計算複雑性が低減されている。
理論的解析により、Byzantineワーカーが引き起こす誤差は有界であり、その行動に依存せず、故障ワーカー数にのみ依存することが確認された。
弱い正則性条件のもとで、ワーカーの定数割合がByzantineであっても、アルゴリズムは安定性と収束性を維持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。