QUICK REVIEW

[論文レビュー] GFF: Gated Fully Fusion for Semantic Segmentation

Xiangtai Li, Houlong Zhao|arXiv (Cornell University)|Apr 3, 2019

Advanced Neural Network Applications参考文献 56被引用数 35

ひとこと要約

ゲート付き Fully Fusion (GFF) を導入し、ゲートで多段階特徴を選択的に融合。Dense Feature Pyramid によって強化され、Cityscapes, Pascal Context, COCO-stuff, ADE20K で最先端のセマンティックセグメンテーションを実現。

ABSTRACT

Semantic segmentation generates comprehensive understanding of scenes through densely predicting the category for each pixel. High-level features from Deep Convolutional Neural Networks already demonstrate their effectiveness in semantic segmentation tasks, however the coarse resolution of high-level features often leads to inferior results for small/thin objects where detailed information is important. It is natural to consider importing low level features to compensate for the lost detailed information in high-level features.Unfortunately, simply combining multi-level features suffers from the semantic gap among them. In this paper, we propose a new architecture, named Gated Fully Fusion (GFF), to selectively fuse features from multiple levels using gates in a fully connected way. Specifically, features at each level are enhanced by higher-level features with stronger semantics and lower-level features with more details, and gates are used to control the propagation of useful information which significantly reduces the noises during fusion. We achieve the state of the art results on four challenging scene parsing datasets including Cityscapes, Pascal Context, COCO-stuff and ADE20K.

研究の動機と目的

ハイレベルな意味情報と高解像度のディテールの両方を活用して意味セマンティックセグメンテーションを改善する。
多段階特徴間で情報を選択的に伝播させる融合機構を開発する。
文脈モデリングを取り入れて多段階特徴の意味表現を強化する。
複数の標準的なシーンパーシングベンチマークで最先端の性能を示す。

提案手法

Gated Fully Fusion (GFF) を提案し、情報伝播を制御する per-pixel gates を介して多段階特徴を融合する。
gate maps G_l を sigmoid(w_l * X_l) として定義し、融合時の sender and receiver 情報を調整。
各レベル l での融合を gated addition として定義: X̃_l = (1+G_l)·X_l + (1−G_l)·∑_{i≠l} G_i·X_i。
Dense Feature Pyramid (DFP) を導入し、PSPNet-styled context を全ての特徴マップに密に接続して文脈手掛かりを豊かにする。
主なセグメンテーション損失に加え、中間的な ResNet ステージの補助損失を用いたエンドツーエンド訓練で最適化を安定化。
オプションで backbone 融合を top-down gated path で拡張し、精度と計算コストを評価。

実験結果

リサーチクエスチョン

RQ1ゲート付きの完全結合型多段階特徴融合は、従来の局所的またはトップダウン融合手法よりセグメンテーション精度を改善するか？
RQ2ピクセルごとのゲートは、融合中にノイズ情報を効果的に抑制して、小さな細い物体のディテールを保持できるか？
RQ3Dense Feature Pyramid (DFP) の追加は文脈モデル化と最終的なセグメンテーション性能にどのように影響するか？
RQ4GFF と DFP の改善は、データセットとバックボーンを跨いで一貫しているか？
RQ5既存アーキテクチャへ GFF と DFP を統合した場合、精度向上と計算コストのトレードオフはどうなるか？

主な発見

手法	Cityscapes mIoU (%)
PSPNet (Baseline)	78.6
PSPNet + Concat	78.8
PSPNet + Addition	78.7
PSPNet + FPN	79.3
PSPNet + Gated FPN	79.4
PSPNet + GFF	80.4

GFF は Cityscapes の検証データで PSPNet のベースラインや他の融合バリアントを一貫して上回り、PSPNet + GFF で 80.4% mIoU を達成。
DFP を加えると、GFF + DFP で 81.2%、MS で 81.8% に性能が向上。
ResNet101 バックボーンで、本手法は Cityscapes テストセットで 82.3% mIoU に、WiderResNet で finetune データのみ訓練した場合は 83.3% mIoU に到達。
GFF は、従来の結合・加算・FPN などの融合手法と比較して、小さくて細い物体や境界の処理に優れている。
Pascal Context、COCO-stuff、ADE20K 全体で、GFFNet バリアントはトップまたはトップ近辺の結果を達成し、データセットとバックボーン間での汎用性を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。