QUICK REVIEW

[論文レビュー] Flowformer: Linearizing Transformers with Conservation Flows

Haixu Wu, Jialong Wu|arXiv (Cornell University)|Feb 13, 2022

Neural Networks and Reservoir Computing被引用数 31

ひとこと要約

Flowformerは、フロー保存則に基づくFlow-Attentionを導入し、Transformerの注意機構を線形化します。長いシーケンス・言語・視覚・時系列・強化学習全般で、競争力のある性能を持つ線形時間計算量を実現します。

ABSTRACT

Transformers based on the attention mechanism have achieved impressive success in various areas. However, the attention mechanism has a quadratic complexity, significantly impeding Transformers from dealing with numerous tokens and scaling up to bigger models. Previous methods mainly utilize the similarity decomposition and the associativity of matrix multiplication to devise linear-time attention mechanisms. They avoid degeneration of attention to a trivial distribution by reintroducing inductive biases such as the locality, thereby at the expense of model generality and expressiveness. In this paper, we linearize Transformers free from specific inductive biases based on the flow network theory. We cast attention as the information flow aggregated from the sources (values) to the sinks (results) through the learned flow capacities (attentions). Within this framework, we apply the property of flow conservation into attention and propose the Flow-Attention mechanism of linear complexity. By respectively conserving the incoming flow of sinks for source competition and the outgoing flow of sources for sink allocation, Flow-Attention inherently generates informative attentions without using specific inductive biases. Empowered by the Flow-Attention, Flowformer yields strong performance in linear time for wide areas, including long sequence, time series, vision, natural language, and reinforcement learning. The code and settings are available at this repository: https://github.com/thuml/Flowformer.

研究の動機と目的

inductive biasesに依存しない注意機構の導入を目的とする。
Flow-Attentionを、flow conservationの下でsource competitionとsink allocationを開発する。
多様な領域で性能を保つ線形時間の注意を実証する。

提案手法

注意を、learned flow capacities（attentions）を介してsources（values）からsinks（results）へ情報の流れとして再定式化する。
flow conservationを適用して、 locality biasesなしにsources間の競争とsinksへの割り当てを誘導する。
Flow-Attentionをcompetitionとaggregationの手順で定義し、フロー容量の非負非線形射影 φ(·)を用いる。
流出フローで φ(K)を正規化し、流入フローで φ(Q)を正規化してflow conservationを強制する（Eq. 5）。
保存された流入/流出フロー（Ĩと Ŏ）を計算し、Flow-Attentionを導出する：Competition（Softmax(Ŏ)·V）、Aggregation（φ(Q)/I (φ(K)ᵀĤV)）、Allocation（Sigmoid(Ĩ)⊙A）（Eq. 8）。
Transformersの標準的な注意をFlow-Attentionに置換して、Flowformerを得る。線形時間計算量を実現する。

実験結果

リサーチクエスチョン

RQ1固定された帰納バイアスなしで注意を非自明かつ公平にしつつ、線形の計算量を達成できるか。
RQ2flow-conservationベースのFlow-Attentionは、長いシーケンス・言語・視覚・時系列・強化学習で競争力のある性能を提供するか。
RQ3競争と割り当ての要素が注意品質と下流タスクに与える影響はあるか。

主な発見

モデル	ListOps ↑	Text ↑	Retrieval ↑	Image ↑	Pathfinder ↑	Avg ↑
Flowformer	38.70	64.29	62.24	43.20	73.95	56.48
Flowformer w/o Allocation	37.00	63.78	61.33	42.52	73.26	55.58
Flowformer w/o Competition	36.80	63.48	61.66	42.39	71.90	55.25
Transformer (Vaswani et al., 2017)	36.37	64.27	57.46	42.44	71.40	54.39
BigBird (Zaheer et al., 2020)	36.05	64.02	59.29	40.83	74.87	55.01
cosFormer (Zhen et al., 2022)	37.90	63.41	61.36	43.17	70.33	55.23

Flowformerは、長いシーケンス・言語・視覚・時系列・オフラインRLのベンチマークで、強力なベースラインと同等かそれ以上の結果を達成する。
Long-Range Arenaでは、Flowformerが平均精度56.48を達成し、元のTransformerや多くの効率的注意モデルを上回る。
アブレーションにより、競争と割り当ての各要素が性能向上に寄与することが示される（LRAでそれぞれ加えると平均改善 ≈1.23と0.90）。
言語モデリング（WikiText-103）ではFlowformerはパープレキシティ30.8を達成し、ベースラインおよびアブレーションを上回る（Flowformer w/o Competition 31.2、w/o Allocation 32.2）。
ImageNet-1KではFlowformerは線形注意ベースラインと同等以上を示し、Top-1/Top-5精度で一部の全注意モデルに近づく／超える。
Flowformerは、線形計算量と競争力のある精度を示し、特にシーケンス長が大きくなるほど効率性に有利である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。