QUICK REVIEW

[論文レビュー] Scaling Up Models and Data with $ exttt{t5x}$ and $ exttt{seqio}$

Adam Roberts, Hyung Won Chung|arXiv (Cornell University)|Mar 31, 2022

Topic Modeling被引用数 48

ひとこと要約

本論文は、Transformerモデルとデータパイプラインのスケーリングを簡素化するための2つのオープンソースライブラリ、t5xとseqioを提示し、TPUや他のハードウェアでの大規模トレーニングと再現可能な評価を可能にする。

ABSTRACT

Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: $ exttt{t5x}$ simplifies the process of building and training large language models at scale while maintaining ease of use, and $ exttt{seqio}$ provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data. Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures. $ exttt{t5x}$ and $ exttt{seqio}$ are open source and available at https://github.com/google-research/t5x and https://github.com/google/seqio, respectively.

研究の動機と目的

大規模言語モデルのスケーラブルで再現可能なトレーニングの必要性を動機づける。
拡張スケールでTransformerモデルを構築・訓練・評価・推論するためのJAXベースのライブラリとしてt5xを紹介する。
効率的で決定論的、再現可能なデータ処理のためのタスクベースのデータパイプラインAPIとしてseqioを紹介する。
これらのライブラリがエンコーダー-デコーダーおよびデコーダー中心のアーキテクチャをサポートし、既存のフレームワークと統合される様子を示す。

提案手法

t5xのモジュラーアーキテクチャと、それがjax.pjitをXLA GSPMDでラップしてモデル・データ・活性化のパーティショニングを行う方法を説明する。
データ vs. モデル、1D/2Dパラメータおよび活性化のパーティショニングなどのパーティショニングオプションを説明し、ZeRO-3やMegatronなどの既知のスキームと関連付ける。
FlaxとGin-configをモデル実装と設定に使用する方法を詳述し、旧式のT5およびMesh TensorFlowモデルとの互換性を含む。
tfのtensorflow.data上に構築された、スケーラブルで決定論的なデータパイプラインとマルチタスクミックスを提供するタスクベースAPIとしてseqioを提示する。
再現性・回復性・シャーディング・グローバルシャッフルをApache Beamで実装した決定論的パイプライン機能を概説する。

実験結果

リサーチクエスチョン

RQ1高レベルの抽象化を用いて、データ並列とモデル並列の軸全体でTransformerモデルのスケーリングをどのように簡素化できるか。
RQ2公正な比較と効率的なデバッグを可能にするために、再現性のある決定論的データパイプラインをどう確保するか。
RQ3t5xとseqio内のエンコーダ-デコーダとデコーダー中心アーキテクチャの実用的な設定とワークフローは何か。
RQ4t5xとseqioは既存のモデル実装・トレーニングエコシステム（Flax、TensorFlow、PyTorch）とどのように統合されるか。

主な発見

t5xはJAX/XLA GSPMDへの高レベルインターフェースを提供し、大規模なTransformerモデルのデータ・パラメータ・活性化の柔軟なパーティショニングを可能にする。
seqioは決定論的なパイプライン、再現性、回復性、効率的な分散読み取りを伴うタスクベースのデータパイプラインを可能にし、大規模なトレーニングと評価を支援する。
これらのライブラリはエンコーダ-デコーダおよびデコーダー中心のモデル構成をサポートし、旧来のT5コードベースおよびMesh TensorFlowモデルからの互換性と変換パスを提供する。
Open-source release with configurations and guidance for T5-like and GPT-like architectures facilitates rapid experimentation and scaling across TPU environments.
Adoption within Google and external researchers demonstrates usability and research-friendliness for large-scale language modeling.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。