QUICK REVIEW

[論文レビュー] DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

Shuaiwen Leon Song, Bonnie Kruft|arXiv (Cornell University)|Oct 6, 2023

Machine Learning in Materials Science被引用数 8

ひとこと要約

本論文は DeepSpeed4Science イニシアティブを紹介し、深層学習系の大規模科学発見を加速するために DeepSpeed 上で構築された AI システム技術を詳述。二つの構造生物学の事例紹介と、より広範な科学協力のプランを提示。

ABSTRACT

In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research.

研究の動機と目的

科学ドメイン向けのAIシステム技術の必要性を動機づける（汎用LLM加速を超える観点）.
DeepSpeed4Science のアプローチと、DeepSpeed の三本柱（トレーニング、推論、圧縮）に基づくことを説明する.
DS4Sci が構造生物学の二つのシステム課題（Evoformer アテンションのメモリ爆発、GenSLM の長シーケンスサポート）をどのように解決するかを示す.
科学のためのAIシステム技術を共有するための協力モデルと、AIシステム技術を科学的発見のために共有するプラットフォームの可能性を概説する

提案手法

メモリ爆発を排除するためのカスタマイズされたメモリ効率的な EvoformerAttention カーネルを開発する。
注意マスクと位置エンベディングのメモリ最適化を通じて長シーケンスサポートを備えた Megatron-DeepSpeed フレームワークを統合する。
Megatron-DeepSpeed のリバージングと最適化を通じてゲノム規模の基盤モデルの超長シーケンスの学習/推論を可能にする。
精度を保ちつつピークメモリを削減するために、カーネルを融合し、タイル化、オンザフライブロードキャスティング、FP32 安全勾配処理を採用する。
シーケンス並列性、テンソル/パイプライン並列性、モデル/データのオフロードを活用してシーケンス長を飛躍的に拡張する。

Figure 1: DeepSpeed4Science approach: developing a new set of AI system technologies that are beyond generic large language model support, tailored for accelerating scientific discoveries and addressing their complexity.

実験結果

リサーチクエスチョン

RQ1科学に焦点を当てたモデルのメモリとシーケンス長の課題に対して、AIシステム技術をどう特化させられるか？
RQ2カスタマイズされたカーネルとフレームワークのリベースにより、ゲノム規模および Evoformer ベースのモデルで精度低下なしに文脈サイズを大幅に長くできるか？
RQ3DS4Sci の最適化を構造生物学および GenSLM 風モデルに適用した場合の性能/スループットの向上はどの程度か？
RQ4DS4Sci は科学的発見のための高度なAIシステム技術の共有と協力を広く促進できるか？

主な発見

DS4Sci_EvoformerAttention カーネルは OpenFold のピークメモリを Evoformer-attention バリアントで 13 倍削減し、精度低下を発生させない。
新しい Megatron-DeepSpeed フレームワークにより GenSLM の長いシーケンスでの学習が可能となり、平均で最大 13 倍長いシーケンス、特定ケースで最大 2 倍のスループットを報告。
Megatron-DeepSpeed のリベースには回転位置エンベディング、FlashAttention v1/v2、そして新しい結合カーネルが追加され、長シーケンスの学習と推論が改善。
注意マスクと位置エンベディングのメモリ最適化、シーケンス並列性により GenSLM の実行可能なシーケンス長が大幅に拡張（例：25B GenSLM の場合 512K）し、従来の限界を超える。
DS4Sci の取り組みは、DeepSpeed4Science を科学のための高度なAIシステム技術を共有するプラットフォームおよびリポジトリとして位置づける。

Figure 2: Peak memory requirement for training variants of the MSA attention kernels (with bias) with the maximum possible training sample dimension in OpenFold. (Left) The original OpenFold implementation with EvoformerAttention used in AlphaFold2. The memory explosion problems in training/inferenc

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。