QUICK REVIEW

[論文レビュー] Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions

Yichao Liu, Zongru Shao|arXiv (Cornell University)|Dec 10, 2021

Advanced Neural Network Applications被引用数 412

ひとこと要約

本論文は Global Attention Mechanism (GAM) を提案し、3D-置換とMLP、および畳み込み空間サブモジュールを用いてチャネルと空間次元を横断する情報を保持し、ResNet および MobileNet バックボーンに対して CIFAR-100 および ImageNet-1K で従来のアテンションモジュールを上回る一貫した利得を示します。

ABSTRACT

A variety of attention mechanisms have been studied to improve the performance of various computer vision tasks. However, the prior methods overlooked the significance of retaining the information on both channel and spatial aspects to enhance the cross-dimension interactions. Therefore, we propose a global attention mechanism that boosts the performance of deep neural networks by reducing information reduction and magnifying the global interactive representations. We introduce 3D-permutation with multilayer-perceptron for channel attention alongside a convolutional spatial attention submodule. The evaluation of the proposed mechanism for the image classification task on CIFAR-100 and ImageNet-1K indicates that our method stably outperforms several recent attention mechanisms with both ResNet and lightweight MobileNet.

研究の動機と目的

チャネルと空間次元を横断して情報を保持する必要性を動機づけ、アテンションにおける跨次元相互作用を強化する。
GAM を提案し、グローバルなチャネル-空間依存性を拡大しつつ情報の損失を低減する。
GAM を標準ベンチマーク（CIFAR-100 と ImageNet-1K）で、複数のアーキテクチャ（ResNet と MobileNet）に対して、既存のアテンションモジュールと比較して評価する。

提案手法

Channel attention submodule uses 3D permutation and a two-layer MLP to capture cross-dimension information.
Spatial attention submodule uses two convolutional layers with no pooling to preserve information, with optional group convolution and channel shuffle to reduce parameters.
GAM applies channel attention first, followed by spatial attention, with element-wise multiplication to form final feature maps.
Comparisons are made against SE, BAM, CBAM, TAM, and ABN under the same training conditions.
Ablation studies examine the contributions of channel vs. spatial attention and the effect of max-pooling in CBAM/GAM.

実験結果

リサーチクエスチョン

RQ1Does GAM provide consistent performance gains over existing attention modules across datasets and architectures?
RQ2How do channel and spatial attention components contribute to GAM’s performance?
RQ3What is the impact of design choices (e.g., pooling, group convolution) on GAM’s efficiency and accuracy?
RQ4Can GAM scale effectively to large datasets and different model depths (ResNet18/50, MobileNetV2)?

主な発見

アーキテクチャ	パラメータ	FLOPs	Top-1 エラー (%)	Top-5 エラー (%)
ResNet 50	23.71M	1.3G	22.74	6.37
ResNet 50 + SE	26.22M	1.31G	20.29	5.18
ResNet 50 + BAM	24.06M	1.33G	19.97	5.03
ResNet 50 + CBAM	26.24M	1.31G	19.44	4.66
ResNet 50 + GAM	149.47M	8.02G	18.67	4.54
ResNet 50 + GAM (gc)	57.05M	3.08G	18.99	4.87
ResNet 18	—	—	30.91	11.12
ResNet 18 + SE	—	—	30.07	10.59
ResNet 18 + BAM	—	—	30.18	10.77
ResNet 18 + CBAM	—	—	29.89	10.53
ResNet 18 + TAM	—	—	30.00	10.64
ResNet 18 + ABN	—	—	29.40	10.34
ResNet 18 + GAM	—	—	29.34	10.23
ResNet 50 + ABN	—	—	23.43	6.92
ResNet 50 + GAM	—	—	22.78	6.43
ResNet 50 + GAM (gc)	—	—	23.01	6.52
MobileNet V2	—	—	30.52	11.20
MobileNet V2 + SE	—	—	29.77	10.65
MobileNet V2 + BAM	—	—	29.91	10.80
MobileNet V2 + CBAM	—	—	29.74	10.66
MobileNet V2 + GAM	—	—	29.31	10.43

GAM outperforms SE, BAM, and CBAM on CIFAR-100 with ResNet-50, including variants with group convolution.
On ImageNet-1K, GAM consistently improves Top-1 and Top-5 errors across ResNet-18, ResNet-50, and MobileNetV2 compared to baseline and other attention modules.
GAM achieves better accuracy with fewer parameters than some competitors (e.g., ResNet-18 with GAM vs ABN).
Ablations show that both spatial and channel attention contribute to performance gains, and their combination yields the best results.
Removing max-pooling in GAM’s design (or CBAM) can still yield strong performance, with GAM often maintaining advantages.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。