QUICK REVIEW

[論文レビュー] numpywren: serverless linear algebra

Vaishaal Shankar, Karl Krauth|arXiv (Cornell University)|Oct 23, 2018

Cloud Computing and Resource Management参考文献 28被引用数 67

ひとこと要約

numpywren はサーバーレス計算で大規模線形代数を可能にし、主要アルゴリズムで ScaLAPACK に近い性能と計算効率を大幅に上回る一方、サーバーレスの局所性制限を浮き彫りにする。

ABSTRACT

Linear algebra operations are widely used in scientific computing and machine learning applications. However, it is challenging for scientists and data analysts to run linear algebra at scales beyond a single machine. Traditional approaches either require access to supercomputing clusters, or impose configuration and cluster management challenges. In this paper we show how the disaggregation of storage and compute resources in so-called "serverless" environments, combined with compute-intensive workload characteristics, can be exploited to achieve elastic scalability and ease of management. We present numpywren, a system for linear algebra built on a serverless architecture. We also introduce LAmbdaPACK, a domain-specific language designed to implement highly parallel linear algebra algorithms in a serverless setting. We show that, for certain linear algebra algorithms such as matrix multiply, singular value decomposition, and Cholesky decomposition, numpywren's performance (completion time) is within 33% of ScaLAPACK, and its compute efficiency (total CPU-hours) is up to 240% better due to elasticity, while providing an easier to use interface and better fault tolerance. At the same time, we show that the inability of serverless runtimes to exploit locality across the cores in a machine fundamentally limits their network efficiency, which limits performance on other algorithms such as QR factorization. This highlights how cloud providers could better support these types of computations through small changes in their infrastructure.

研究の動機と目的

単一マシンを超えるスケーラブルな線形代数の必要性を動機づけ、クラスターのプロビジョニングの複雑さを低減する。
ストレージと計算を分離して弾性スケーラビリティを実現するサーバーレスアーキテクチャを提案する。
stateless 設定でタイル化された行列上の並列線形代数を表現する DSL を紹介する LAmbdaPACK を紹介する。
従来の HPC および耐障害性データ並列システムと比較した性能と耐障害性の利点を実証する。

提案手法

線形代数タスクを stateless な関数として実行し、中間状態に分散オブジェクトストアを使用するサーバーレスシステム（numpywren）を開発する。
LAmbdaPACK を導入し、タイル化された線形代数アルゴリズムを DAG に似た依存グラフとして表現するドメイン特化言語。
LAmbdaPACK プログラムから実行可能なタスクグラフを生成するために分散依存解析を使用する。
リースと弾性プロビジョナーを用いた耐障害性の実行モデルを実装してワーカーを管理する。
GEMM、QR、SVD、Cholesky に対するエンドツーエンドの性能を ScaLAPACK および Dask と比較して評価する。
サーバーレス実行環境における局所性の活用不足による制限と、潜在的なインフラ微調整について議論する。

実験結果

リサーチクエスチョン

RQ1分離されたストレージを用いたサーバーレスランタイムは大規模な線形代数を効率よく実行できるか。
RQ2完了時間とCPU時間の面で、サーバーレスの線形代数は従来の HPC ライブラリにどれだけ近づくことができるか。
RQ3線形代数アルゴリズムにおける stateless タスク設計がネットワークトラフィックと耐障害性にもたらすトレードオフは何か。
RQ4LAmbdaPACK は複雑な線形代数 DAG のコンパクトな表現とスケーラブルなスケジューリングをどのように可能にするのか。

主な発見

Numpywren’s performance for matrix multiply, SVD, and Cholesky decomposition is within 33% of ScaLAPACK.
Compute efficiency is up to 240% better due to elasticity.
For Cholesky on a 1M x 1M matrix, numpywren is within 36% of ScaLAPACK completion time and can use 33% fewer CPU-hours.
Compared to Dask, numpywren can be up to 320% faster on fault-tolerant data-parallel workloads.
Serverless locality limitations reduce network efficiency for certain algorithms such as QR factorization, highlighting infrastructure design opportunities for providers.
LAmbdaPACK enables compact, large-scale DAG representations (millions of nodes in ~2 KB) and supports key algorithms like Cholesky, TSQR, LU, and SVD.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。