QUICK REVIEW

[論文レビュー] Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks

Meng-Hao Guo, Zheng-Ning Liu|ArXiv.org|May 5, 2021

Advanced Neural Network Applications参考文献 98被引用数 79

ひとこと要約

外部メモリを2つ用いた外部注意機構を提案し、tiny learnable linear layersとして実装、線形計算量を実現。 visionタスク全般で競争力のある結果を達成し、EAMLPという全MLP版も含む。

ABSTRACT

Attention mechanisms, especially self-attention, have played an increasingly important role in deep feature representation for visual tasks. Self-attention updates the feature at each position by computing a weighted sum of features using pair-wise affinities across all positions to capture the long-range dependency within a single sample. However, self-attention has quadratic complexity and ignores potential correlation between different samples. This paper proposes a novel attention mechanism which we call external attention, based on two external, small, learnable, shared memories, which can be implemented easily by simply using two cascaded linear layers and two normalization layers; it conveniently replaces self-attention in existing popular architectures. External attention has linear complexity and implicitly considers the correlations between all data samples. We further incorporate the multi-head mechanism into external attention to provide an all-MLP architecture, external attention MLP (EAMLP), for image classification. Extensive experiments on image classification, object detection, semantic segmentation, instance segmentation, image generation, and point cloud analysis reveal that our method provides results comparable or superior to the self-attention mechanism and some of its variants, with much lower computational and memory costs.

研究の動機と目的

自己注意の二次的計算量と視覚タスクにおけるサンプル不変性の課題を動機づけ、解決する。
データセットレベルの相関を捉える小さな共有メモリユニットを持つ外部注意を導入する。
外部注意が人気のあるアーキテクチャで自己注意の代替となり、計算とメモリコストを低減できることを示す。
画像分類、検出、セグメンテーション、生成、3D点群タスクを横断して外部注意の汎用性を示す。
競争力のある性能を持つ全MLPアーキテクチャを作成するためのマルチヘッド外部注意（EAMLP）を提案する。

提案手法

外部メモリユニット2つ（M_kとM_v）をキーおよび値のメモリとして定義する外部注意を定義する。
注意は A = Norm(F M_k^T) および F_out = A M_v により、M_k および M_v を実装する線形層を用いる。
行と列の注意スコアを安定化させるために二重正規化を使用する。
より豊かな表現のためにマルチヘッド外部注意へ拡張する。
既存のアーキテクチャに外部注意を組み込み、全MLPモデル（EAMLP）を構築する。

実験結果

リサーチクエスチョン

RQ1外部注意は線形計算コストで視覚アーキテクチャにおける自己注意の代替になり得るか？
RQ2データセットレベルの外部メモリを取り入れることで、さまざまな視覚タスクにおける一般化と性能は改善されるか？
RQ3マルチヘッド外部注意（MEA）は自己注意や他の注意系と比べて精度と効率の面でどうか？
RQ4外部注意によりImageNetでCNN/Transformerと同等の性能を持つ全MLP視覚モデルを実現できるか？
RQ5正規化戦略が外部注意の安定性と性能に与える影響は何か？

主な発見

外部注意は、計算量とメモリ使用量を低く抑えつつ、タスク全般で自己注意と同等またはそれを上回る結果を達成する。
小さな共有メモリ（例：S ~ 64）を用いると、入力サイズに対して線形の複雑性（O(dSN)）を得られる。
マルチヘッド外部注意は全MLPアーキテクチャ（EAMLP）を可能にし、ImageNetの精度は競争力があり、報告設定でTop-1が79.4%に達することもある。
自己注意を外部注意に置換することで、セグメンテーションおよび検出の指標が複数のベンチマーク（例：VOC、COCO）で改善され、バックボーンネットワークへの組み込みによって効果を発揮する。
外部注意は意味のある物体や領域に焦点を当てる解釈可能な注意マップを提供し、ヘッドは異なる領域に注意を向ける。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。