QUICK REVIEW

[論文レビュー] Context Encoding for Semantic Segmentation

Hang Zhang, Kristin Dana|arXiv (Cornell University)|Mar 23, 2018

Advanced Neural Network Applications参考文献 51被引用数 135

ひとこと要約

Encoding LayerとSemantic Encoding Lossを通じてグローバルなシーン文脈を活用するContext Encoding Module (EncNet) を導入し、追加計算を最小限に抑えてセマンティックセグメンテーションを改善し、PASCAL VOC 2012、PASCAL-Context、ADE20Kで最先端の結果を達成する。

ABSTRACT

Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps. The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN. Our approach has achieved new state-of-the-art results 51.7% mIoU on PASCAL-Context, 85.9% mIoU on PASCAL VOC 2012. Our single model achieves a final score of 0.5567 on ADE20K test set, which surpass the winning entry of COCO-Place Challenge in 2017. In addition, we also explore how the Context Encoding Module can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset. Our 14 layer network has achieved an error rate of 3.45%, which is comparable with state-of-the-art approaches with over 10 times more layers. The source code for the complete system are publicly available.

研究の動機と目的

セグメンテーション時に考慮すべき対象カテゴリの探索空間を削減するために、グローバルなシーンコンテキストの利用を動機づける。
グローバルな特徴統計をエンコードし、クラス依存の特徴マップを選択的にスケーリングする軽量モジュールを開発する。
シーン中に存在するカテゴリの認識を促すよう、Semantic Encoding Lossで訓練を正則化する。
Context Encoding ModuleをDilated FCNバックボーン（EncNet）に統合し、標準ベンチマークで評価する。
画像分類（CIFAR-10）について、浅いネットワークに対するコンテキストエンコードの追加利点を示す。

提案手法

Encoding Layerを拡張して、密な畳み込み特徴からグローバルな文脈統計を捉える。
Encoding Layerの出力からチャネルごとのスケーリング因子を予測し、それらを要素ごとの乗算で適用する。
Semantic Encoding Loss (SE-loss)を導入し、シーン中のオブジェクトカテゴリの存在を予測して訓練を正則化する。
事前学習済みのDilated Convolutionを用いたResNetの上にContext Encoding Moduleを挿入してEncNetを構築し、必要に応じて複数段階でSE-lossを適用する。
より大きな有効バッチサイズを用いた安定した訓練のために、同期クロス-GPU Batch Normalizationを使用する。
標準指標（pixAcc, mIoU）でPASCAL-Context、PASCAL VOC 2012、ADE20Kを評価し、CIFAR-10の分類性能もテストする。

実験結果

リサーチクエスチョン

RQ1Explicit global context modeling does it improve per-pixel semantic segmentation beyond receptive field enlargement techniques?
RQ2Can a lightweight Context Encoding Module improve segmentation without substantial computational overhead?
RQ3How does SE-loss influence learning of scene-level semantics and small object segmentation?
RQ4Is EncNet competitive with or superior to state-of-the-art methods on PASCAL-Context, VOC 2012, and ADE20K without COCO pre-training?
RQ5Can context encoding benefits extend to shallower networks for image classification (CIFAR-10)?

主な発見

EncNet with the Context Encoding Module yields significant gains over the FCN baseline (e.g., from 41.0% mIoU to 47.6% mIoU on a ResNet-50 baseline).
With ResNet-101, EncNet achieves 51.7% mIoU on PASCAL-Context and 85.9% mIoU on PASCAL VOC 2012 (with COCO pre-training in the VOC case).
EncNet-101 single model achieves 0.5567 on ADE20K test set, surpassing the COCO-Place Challenge 2017 winner.
SE-loss weight of 0.2 and 32 codewords in the Encoding Layer provide best ablation performance, with marginal additional computation.
On CIFAR-10, a 14-layer EncNet achieves 3.96% error, competitive with deeper models, demonstrating broader utility of context encoding.
EncNet offers state-of-the-art results on major segmentation benchmarks while preserving efficiency and compatibility with existing FCN-based frameworks.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。