QUICK REVIEW

[論文レビュー] A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic

Ahmad Abdelfattah, Hartwig Anzt|arXiv (Cornell University)|Jul 13, 2020

Numerical Methods and Algorithms参考文献 94被引用数 24

ひとこと要約

本調査は、NVIDIA Tensor Cores などの低精度ハードウェアを活用した混合精度算術を用いた最先端の数値計算手法を統合的に検討している。低精度計算と高精度の修正手法を組み合わせることで、特に密行列・疎行列の線形代数、クリロフ部分空間ソルバーや前処理において、数値的精度を維持しつつ最大10倍の高速化を達成できることが示されている。

ABSTRACT

Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more than an order of magnitude higher performance than what is available in IEEE double precision. At the same time, the gap between the compute power on the one hand and the memory bandwidth on the other hand keeps increasing, making data access and communication prohibitively expensive compared to arithmetic operations. To start the multiprecision focus effort, we survey the numerical linear algebra community and summarize all existing multiprecision knowledge, expertise, and software capabilities in this landscape analysis report. We also include current efforts and preliminary results that may not yet be considered "mature technology," but have the potential to grow into production quality within the multiprecision focus effort. As we expect the reader to be familiar with the basics of numerical linear algebra, we refrain from providing a detailed background on the algorithms themselves but focus on how mixed- and multiprecision technology can help improving the performance of these methods and present highlights of application significantly outperforming the traditional fixed precision methods.

研究の動機と目的

科学計算分野における混合精度数値アルゴリズムの既存知識を分析・統合すること。
エクサスケールシステムでの実用的応用が見込まれる新規の多精度技術を同定・評価すること。
低精度算術におけるハードウェアの進歩と数値線形代数におけるソフトウェアアルゴリズム設計のギャップを埋めること。
エクサスケールコンピューティングプロジェクトが、堅牢でポータブルかつ効率的な多精度アルゴリズムを開発するための指針を提供すること。

提案手法

低精度算術ユニットを活用した混合精度実行を想定して、既存の数値線形代数アルゴリズムを調査・適合させること。
古典的反復修正法やGMRES-IRなどの反復的修正戦略を用いて、低精度解を高精度に修正すること。
データ移動を削減し、性能を向上させるために、量子化整数LU分解および混合精度コレスキー分解・因子分解手法を導入すること。
帯域幅のボトル neck を軽減するため、混合精度MPIや近似FFTを含む、データ圧縮および通信技術を設計すること。
確率的丸め誤差解析を活用し、低精度計算に対して理論的保証を提供すること。
テンプレートベースのスカラー型とランタイム精度制御を用いて、PETSc、Trilinos、Ginkgo、hypre などの主要HPCソフトウェアスタックに多精度機能を統合すること。

実験結果

リサーチクエスチョン

RQ1密行列および疎行列の線形代数問題に対して、混合精度算術を効果的に適用することで、精度を損なわず性能を向上させることは可能か？
RQ2反復的修正や前処理などのアルゴリズム戦略の中で、低精度算術を用いる際の精度を維持するために最も効果的なものは何か？
RQ3特に分散メモリシステムにおいて、多精度アルゴリズムにおけるデータ圧縮および通信オーバーヘッドをどのように低減できるか？
RQ4確率的丸め誤差解析を低精度数値計算に適用する際の理論的・実用的限界は何か？
RQ5既存のHPCソフトウェアフレームワークは、パフォーマンスやポータビリティに著しい影響を及ぼさずに、どのようにして混合精度計算をネイティブにサポートできるか？

主な発見

SummitでTensor Coresを用いた混合精度GEMM（HGEMM）は、倍精度GEMMと比較して10倍以上の高速化を達成した。
半精度行列・ベクトル積と倍精度修正を組み合わせたGMRES-IRは、完全な倍精度GMRESと同等の収束速度を示し、顕著な性能向上を実現した。
量子化整数LU分解はメモリトラフィックを削減し、特に構造的特徴を持つ行列において高速な因子分解を可能にした。
確率的丸め誤差解析により、誤差境界がnuではなく√(n log n)の割合で増大することが示され、大規模な低精度計算の安定性が裏付けられた。
PETSc、Trilinos、Ginkgo への混合精度サポート統合により、最小限のコード変更で柔軟かつ高性能なソルバが実現した。
動的分割と精度制御を備えた近似FFTは、スペクトル法における高速化を実現しながら、許容可能な誤差レベルを維持した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。