QUICK REVIEW

[論文レビュー] Polynomial codes: an optimal design for high-dimensional coded matrix multiplication

Qian Yu, Mohammad Ali Maddah-Ali|arXiv (Cornell University)|Dec 4, 2017

Stochastic Gradient Optimization Techniques参考文献 17被引用数 367

ひとこと要約

この論文は、分散行列乗算における最適な回復閾値（出力を再構築するために必要な最小のワーカー数）を達成する、新たな符号理論的戦略である多項式符号を提案する。計算を多項式補間へマッピングすることにより、遅延の低い効率的な回復が可能となり、特に遅延者（stragglers）が存在する状況でも、先行研究を上回る回復閾値、遅延、通信負荷の観点で優れた性能を発揮する。

ABSTRACT

We consider a large-scale matrix multiplication problem where the computation is carried out using a distributed system with a master node and multiple worker nodes, where each worker can store parts of the input matrices. We propose a computation strategy that leverages ideas from coding theory to design intermediate computations at the worker nodes, in order to optimally deal with straggling workers. The proposed strategy, named as polynomial codes, achieves the optimum recovery threshold, defined as the minimum number of workers that the master needs to wait for in order to compute the output. This is the first code that achieves the optimal utilization of redundancy for tolerating stragglers or failures in distributed matrix multiplication. Furthermore, by leveraging the algebraic structure of polynomial codes, we can map the reconstruction problem of the final output to a polynomial interpolation problem, which can be solved efficiently. Polynomial codes provide order-wise improvement over the state of the art in terms of recovery threshold, and are also optimal in terms of several other metrics including computation latency and communication load. Moreover, we extend this code to distributed convolution and show its order-wise optimality.

研究の動機と目的

大規模な分散行列乗算システムにおける遅延者ワーカーの課題に対処すること。
マスターノードが待機する必要のあるワーカー数（回復閾値）を最小化する符号戦略を設計することにより、計算遅延を短縮すること。
分散コンピューティング環境における障害耐性のための最適な再冗長化利用を達成すること。
符号フレームワークを分散畳み込み演算へ拡張し、理論的最適性の保証を得ること。

提案手法

代数的符号理論を活用し、ワーカーノードにおける中間計算を多項式評価を用いて設計する。
最終的な行列積の再構築を多項式補間問題へマッピングし、効率的なデコードを可能にする。
有限体上での多項式補間を用いて、入力行列からコード語を構築する。
マスターノードが回復閾値に等しいワーカーの任意のサブセットから出力を再構築できることを保証する。
構造化された冗長性を用いて、通信負荷と計算オーバーヘッドを最小化する。
多項式評価および補間プロセスの適応により、フレームワークを分散畳み込みへ拡張する。

実験結果

リサーチクエスチョン

RQ1分散行列乗算システムにおいて、マスターノードが出力を再構築するために待機する必要のあるワーカーの最小数は何か？
RQ2回復閾値の理論的下限に達する符号スキームを設計できるか？
RQ3代数的構造を用いることで、再構築プロセスを効率的かつスケーラブルにできるか？
RQ4提案された符号戦略を、分散畳み込みなどの他の線形代数演算へ拡張できるか？
RQ5従来の手法と比較して、遅延および通信負荷にどの程度の改善が見られるか？

主な発見

多項式符号は、出力を再構築するために必要な最小ワーカー数である理論的最適な回復閾値を達成する。
特に高次元設定において、先行研究の最良手法と比較して、回復閾値が順序的に優れている。
多項式補間を用いた効率的な再構築が可能となり、デコードの複雑さが顕著に低減される。
計算遅延および通信負荷の観点で、最適なパフォーマンスを達成する。
フレームワークは分散畳み込みへ拡張され、回復閾値において順序的に最適性を維持する。
多項式符号の代数的構造のおかげで、分散システムにおけるシステマティックかつスケーラブルな実装が可能になる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。