QUICK REVIEW

[論文レビュー] Efficiently Modeling Long Sequences with Structured State Spaces

Albert Gu, Karan Goel|arXiv (Cornell University)|Oct 31, 2021

Topic Modeling参考文献 41被引用数 481

ひとこと要約

S4は、非常に長いシーケンスを効率的に扱える構造化状態空間系列モデルを導入し、長距離依存性のベンチマークで最先端の結果を達成します。Path-Xの解決を含み、Transformersと比較して生成を大幅に高速化します。

ABSTRACT

A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of $10000$ or more steps. A promising recent approach proposed modeling sequences by simulating the fundamental state space model (SSM) $ x'(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t) $, and showed that for appropriate choices of the state matrix $ A $, this system could handle long-range dependencies mathematically and empirically. However, this method has prohibitive computation and memory requirements, rendering it infeasible as a general sequence modeling solution. We propose the Structured State Space sequence model (S4) based on a new parameterization for the SSM, and show that it can be computed much more efficiently than prior approaches while preserving their theoretical strengths. Our technique involves conditioning $ A $ with a low-rank correction, allowing it to be diagonalized stably and reducing the SSM to the well-studied computation of a Cauchy kernel. S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91\% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet, (ii) substantially closing the gap to Transformers on image and language modeling tasks, while performing generation $60 imes$ faster (iii) SoTA on every task from the Long Range Arena benchmark, including solving the challenging Path-X task of length 16k that all prior work fails on, while being as efficient as all competitors.

研究の動機と目的

長距離依存性を跨ぐモーダルやタスクに対応するモデルの必要性を動機づける。
とても長いシーケンスにスケールする実用的で効率的なSSMベースの系列モデルを提案する。
ノーマルプラスローンカーク（NPLR）パラメータ化が高速な計算と安定した訓練を可能にすることを示す。
S4の性能を画像・テキスト・音声のベンチマークで示し、Transformersと競合することを示す。

提案手法

状態空間行列Aをノーマルプラスローンカーク（NPLR）形に再パラメータ化して、対角化を安定させる。
固有分解により対角形へ共役化し、低ランク補正にはWoodburyの恒等を適用して離散SSMカーネルを効率的に計算する。
SSM畳み込みカーネルをCauchyカーネルとして表現し、根の一様点でサンプルした生成関数を切り捨てて評価し、その後逆FFTを用いる。
長距離依存性に対処するためにHiPPOベースの連続時間メモリ理論を使用する。
特徴量を共有するパラメータ（H独立コピー）を持ち、多特徴入力に対して深さ方向のブロードキャスト風アーキテクチャを使用する。

実験結果

リサーチクエスチョン

RQ1SSMsがS4パラメータ化で非常に長いシーケンス（Lが16k以上）を効率的にモデル化でき、標準ベンチマークでTransformerの性能と同等以上を達成できるか？
RQ2注意機構なし・低アテンションモデルが言語および画像モデリングでTransformerにどれだけ近づきつつ、生成をより高速に提供できるか？
RQ3SSMベースのモデルは最小限の構造変更でドメイン間（画像・テキスト・音声）に generalized できるか？
RQ4再発・畳み込み表現に対するNPLR S4パラメータ化は、複雑さ・安定性などの理論的・計算的保証をどう提供するか？

主な発見

S4はシーケンシャルなCIFAR-10でデータ拡張や補助損失なしで91%の精度を達成し、より大きな2-D ResNetと並ぶ。
S4は画像および言語モデリングタスクでTransformerとの差を大幅に縮めつつ、生成を約60倍高速化可能。
S4はLong Range Arenaタスクで最先端を樹立し、Path-X（長さ16k）を解決して88%の精度を達成（以前の作業はランダム推測）。
長さ16000のシーケンスを扱う音声分類で、S4はテスト誤差を半減させ1.7%を達成し、専門のSpeech CNNsを上回るが、ベースラインには及ばない。
WikiText-103の言語モデルでは、S4はTransformerベースラインの0.8困難度の範囲であり、アテンションなしでも競争力を示す。
S4は高速な自己回帰生成、マルチドメイン適用性（画像・テキスト・音声）、再訓練なしでサンプリングレートの変更に対する頑健性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。