QUICK REVIEW

[論文レビュー] Long Range Arena: A Benchmark for Efficient Transformers

Yi Tay, Mostafa Dehghani|arXiv (Cornell University)|Nov 8, 2020

Advanced Neural Network Applications参考文献 47被引用数 195

ひとこと要約

この論文は Long Range Arena (LRA) を提示し、長文コンテキストタスクに対して効率的な Transformer を統一的に評価するベンチマークを構築します。1K–16K トークンのデータで10モデルを比較し、多様なデータタイプとタスクを横断して性能・速度・メモリを分析し、トレードオフを浮き彫りにします。単一の“最良”解は存在しないことを示します。

ABSTRACT

Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity. In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models. To this date, there is no well-established consensus on how to evaluate this class of models. Moreover, inconsistent benchmarking on a wide spectrum of tasks and datasets makes it difficult to assess relative model quality amongst many models. This paper proposes a systematic and unified benchmark, LRA, specifically focused on evaluating model quality under long-context scenarios. Our benchmark is a suite of tasks consisting of sequences ranging from $1K$ to $16K$ tokens, encompassing a wide range of data types and modalities such as text, natural, synthetic images, and mathematical expressions requiring similarity, structural, and visual-spatial reasoning. We systematically evaluate ten well-established long-range Transformer models (Reformers, Linformers, Linear Transformers, Sinkhorn Transformers, Performers, Synthesizers, Sparse Transformers, and Longformers) on our newly proposed benchmark suite. LRA paves the way towards better understanding this class of efficient Transformer models, facilitates more research in this direction, and presents new challenging tasks to tackle. Our benchmark code will be released at https://github.com/google-research/long-range-arena.

研究の動機と目的

長距離トランスフォーマー・モデルのための統一的で一般化されたベンチマークを複数のデータモダリティに跨って確立すること。
長文コンテキストの課題下で広範な効率的 Transformer アーキテクチャを評価すること。
モデル選択と今後の研究を指針づけるための包括的な効率性（速度とメモリ）分析を提供すること。

提案手法

ListOps、Byte-level Text Classification、Byte-level Document Retrieval、Image Classification from sequences、Pathfinder and Pathfinder-X を含む長文コンテキストタスクの設計。
tenefficient Transformer models（Reformer, Linformer, Linear Transformers, Sparse Transformers, Longformer, Sinkhorn Transformers, Synthesizers, BigBird, Performers, and vanilla Transformer）をタスクで評価。
注意領域の必要量を定量化し、タスク別および全体の性能を報告。
再現性と拡張性を高めるため、JAX/Flax でのオープンソースベンチマークコードを提供。

実験結果

リサーチクエスチョン

RQ1異なる効率的 Transformer アーキテクチャは、テキスト、画像、合成データを横断する長距離タスクでどのように性能を発揮するか？
RQ2長いシーケンス長におけるこれらのアーキテクチャ間の速度とメモリのトレードオフはどうなるか？
RQ3一貫して全ての長距離タスクで優れる単一モデルは存在するか、それともトレードオフが支配的か？
RQ4シーケンス長を増やす（例：Pathfinder-X）と学習能力にどのような影響があるか？

主な発見

Model	ListOps	Text	Retrieval	Image	Pathfinder	Path-X	Avg
Transformer	36.37	64.27	57.46	42.44	71.40	FAIL	54.39
Local Attention	15.82	52.98	53.39	41.46	66.63	FAIL	46.06
Sparse Trans.	17.07	63.58	59.59	44.24	71.71	FAIL	51.24
Longformer	35.63	62.85	56.89	42.22	69.71	FAIL	53.46
Linformer	35.70	53.94	52.27	38.56	76.34	FAIL	51.36
Reformer	37.27	56.10	53.40	38.07	68.50	FAIL	50.67
Sinkhorn Trans.	33.67	61.20	53.83	41.23	67.45	FAIL	51.39
Synthesizer	36.99	61.68	54.67	41.61	69.45	FAIL	52.88
BigBird	36.05	64.02	59.29	40.83	74.87	FAIL	55.01
Linear Trans.	16.13	65.90	53.09	42.34	75.30	FAIL	50.55
Performer	18.01	65.40	53.82	42.77	77.05	FAIL	51.41
Task Avg (Std)	29 (9.7)	61 (4.6)	55 (2.6)	41 (1.8)	72 (3.7)	FAIL	52 (2.4)

現在のモデルにとってすべての LRA タスクは難しく、いくつかのタスクで最適性能との差が大きい。
BigBird はタスク間のバランスをとることで総合スコアが最も高くなるが、個々のタスクでトップとは限らない。
Performer や Linear Transformers のようなカーネルベースの variants は、タスク特有の精度を犠牲にすることなく、高速性・メモリ効率の妥協を強く提供する。
極端な長さ（Path-X）ではほとんどのモデルが苦戦し、いずれも解決できていない。 ultra-long sequences に対する現在のアーキテクチャの限界を示す。
一律の解決策はなく、精度・速度・メモリのトレードオフはタスクとモデルによって異なる。
メモリフットプリントは大きく異なる。Linformer は 4K でデバイスあたり約 1 GB に近づく一方、ベーシックな Transformer は 4K でデバイスあたり約 9.48 GB を必要とする可能性があり、効率性のギャップを強調する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。