QUICK REVIEW

[論文レビュー] Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks

Xiang Li, Xiaolin Hu|arXiv (Cornell University)|May 23, 2019

Advanced Neural Network Applications参考文献 44被引用数 178

ひとこと要約

SGE は、各グループ内の意味的サブ特徴を強化する軽量な空間的グループ間注意機構を導入し、グローバル-ローカルの類似性を用いて各場所に対する注意を生成することで、オーバーヘッドをほとんど増やさずに性能を向上させます。ResNet のようなバックボーンや COCO の検出において、分類と検出の性能を改善します。

ABSTRACT

The Convolutional Neural Networks (CNNs) generate the feature representation of complex objects by collecting hierarchical and different parts of semantic sub-features. These sub-features can usually be distributed in grouped form in the feature vector of each layer, representing various semantic entities. However, the activation of these sub-features is often spatially affected by similar patterns and noisy backgrounds, resulting in erroneous localization and identification. We propose a Spatial Group-wise Enhance (SGE) module that can adjust the importance of each sub-feature by generating an attention factor for each spatial location in each semantic group, so that every individual group can autonomously enhance its learnt expression and suppress possible noise. The attention factors are only guided by the similarities between the global and local feature descriptors inside each group, thus the design of SGE module is extremely lightweight with \emph{almost no extra parameters and calculations}. Despite being trained with only category supervisions, the SGE component is extremely effective in highlighting multiple active areas with various high-order semantics (such as the dog's eyes, nose, etc.). When integrated with popular CNN backbones, SGE can significantly boost the performance of image recognition tasks. Specifically, based on ResNet50 backbones, SGE achieves 1.2\% Top-1 accuracy improvement on the ImageNet benchmark and 1.0$\sim$2.0\% AP gain on the COCO benchmark across a wide range of detectors (Faster/Mask/Cascade RCNN and RetinaNet). Codes and pretrained models are available at https://github.com/implus/PytorchInsight.

研究の動機と目的

grouped CNN チャンネル内の意味的サブ特徴の学習を改善する動機付け。
パラメータを大幅に増やさずにグループ特徴の空間分布を強化する軽量モジュールの提案。
SGE が意味的領域の局在化を改善し、特徴マップのノイズを低減することを示す。
画像分類と物体検出のベンチマークでの性能向上を実証する。

提案手法

特徴マップを G 個のチャネルグループに分割し、それぞれを意味的グループとして扱う。
グループ内の空間位置を平均化してグローバルなグループ特徴 g を計算する。
各場所 i の係数 c_i を g と局所的な x_i の点積として計算し、学習可能な gamma と beta で正規化して a_i を得る。
a_i にシグモイドゲート sigma(a_i) を掛けて x_i を拡張特徴 hat{x}_i（各グループ内）として得る。
ボトルネックの後に BatchNorm の後に SGE を統合し、追加のパラメータをほとんど増やさない（グループごとの gamma, beta）。
正規化、グループ数、初期化の効果を検証するための可視化とアブレーション研究を提供する。

実験結果

リサーチクエスチョン

RQ1 Spatial Group-wise Enhance モジュールはグループ内で意味的特徴学習を信頼性高く向上させるか。
RQ2 SGE は最小限のパラメータ負荷で画像分類と物体検出の性能を改善できるか。
RQ3 デザイン上の選択肢（グループ数、正規化、初期化）は SGE の有効性にどのような影響を与えるか。
RQ4 SGE は性能と効率の点で既存の注意機構とどう比較されるか。

主な発見

SGE は ImageNet の Top-1 精度を ResNet50 で 1.2%、COCO 検出器（Faster/Mask/Cascade RCNN および RetinaNet）で AP を 1.0–2.0% 向上させる。
SGE は小さな物体の検出を一貫して改善し、小さな物体に対する RetinaNet で SE より約 1% AP の上昇を示す。
SGE は最先端の注意機構と比較して、パラメータが少なく計算コストも低い状態で競争力または優れた結果を提供する。
正規化は安定した学習と性能のために不可欠であり、正規化を除去すると精度が大幅に低下する。
グループ数 G を増減させると、最適なパフォーマンスのためのスイートスポットが現れ、一般に G = 32 または 64 が適している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。