QUICK REVIEW

[論文レビュー] Dual Attention Network for Scene Segmentation

Jun Fu, Jing Liu|arXiv (Cornell University)|Sep 9, 2018

Advanced Neural Network Applications参考文献 38被引用数 211

ひとこと要約

DANetは拡張畳み込みネットワーク上に空間（位置）注意とチャネル注意モジュールを導入し、グローバル依存関係を捉えることでCityscapes、PASCAL Context、COCO Stuff、PASCAL VOC 2012データセットにおいて最先端の結果を達成します。

ABSTRACT

In this paper, we address the scene segmentation task by capturing rich contextual dependencies based on the selfattention mechanism. Unlike previous works that capture contexts by multi-scale features fusion, we propose a Dual Attention Networks (DANet) to adaptively integrate local features with their global dependencies. Specifically, we append two types of attention modules on top of traditional dilated FCN, which model the semantic interdependencies in spatial and channel dimensions respectively. The position attention module selectively aggregates the features at each position by a weighted sum of the features at all positions. Similar features would be related to each other regardless of their distances. Meanwhile, the channel attention module selectively emphasizes interdependent channel maps by integrating associated features among all channel maps. We sum the outputs of the two attention modules to further improve feature representation which contributes to more precise segmentation results. We achieve new state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset. In particular, a Mean IoU score of 81.5% on Cityscapes test set is achieved without using coarse data. We make the code and trained model publicly available at https://github.com/junfu1115/DANet

研究の動機と目的

長距離の文脈依存性を多スケール特徴融合を超えてモデル化することで、シーン分割の改善を動機付ける。
空間的およびクラス間関係を捉える2つの補完的な自己注意モジュール（位置とチャネル）を提案する。
注意モジュールの出力を融合させることでピクセルレベルの予測の特徴表現を強化することを示す。

提案手法

拡張畳み込みネットワークの骨幹の上に2つの平行な自己注意モジュールを追加する。
Position Attention Module: 空間注意マップSを計算し、E = alpha * D * S^T + A を生成する。alphaは0から学習される。
Channel Attention Module: Aからチャネル注意マップXを計算し、E = beta * X * A^T + A を生成する。betaは0から学習される。
2つの注意強化特徴を畳み込み埋め込みと要素ごとの和で融合し、最終的な予測マップを生成する畳み込みを適用する。

実験結果

リサーチクエスチョン

RQ1空間関係（位置注意）の自己注意モデル化は長距離依存性を捉えてピクセル単位のセグメンテーションを改善できるか？
RQ2特徴チャネル間の依存関係をモデル化する（チャネル注意）は意味クラスの識別性を改善できるか？
RQ3空間注意とチャネル注意の両方を組み合わせると、いずれか一方のモジュールのみよりもパフォーマンスが向上するか？

主な発見

Method	BaseNet	PAM	CAM	Mean IoU%
Dilated FCN	Res50			70.03
DANet	Res50	✓		75.74
DANet	Res50		✓	74.28
DANet	Res50	✓	✓	76.34
DANet	Res101	✓	✓	77.57

位置注意だけを用いたアテンションはMean IoUを75.74%（ResNet-50ベースライン）へと向上させる。
チャネル注意のみを用いたアテンションはベースラインよりMean IoUを4.25%向上させる。
PAMとCAMを組み合わせるとMean IoUは76.34%（ResNet-50）となる。
ResNet-101をバックボーンとして両モジュールを用いるとCityscapes valで77.57%のMean IoUを達成する。
データ拡張、マルチグリッド、マルチスケール推論を適用すると、DANet-101はCityscapes valで81.50%のMean IoUを達成し、表のDeeplabv3等を上回る。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。