QUICK REVIEW

[論文レビュー] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Liang-Chieh Chen, George Papandreou|arXiv (Cornell University)|Jun 2, 2016

Advanced Neural Network Applications参考文献 85被引用数 710

ひとこと要約

DeepLabは、dense featuresのためのatrous畳み込み、マルチスケールコンテキストのためのASPP、および境界の refinementのための完全連結CRFを使用するセマンティックセグメンテーションシステムを提示し、PASCAL VOC 2012および他のデータセットで最先端の結果を達成します。

ABSTRACT

In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

研究の動機と目的

DCNNsをセマンティックセグメンテーションへ応用する際の3つの課題を動機づけ、対応する： (i) フィーチャー解像度の低下、(ii) マルチスケールのオブジェクトサイズ、(iii) ローカリゼーション精度。
特徴解像度を制御し、追加パラメータなしで受容野を拡大するためのatrous畳み込みを提案。
マルチスケールコンテキストを効率的に捉えるためのatrous Spatial Pyramid Pooling (ASPP) を導入。
DCNNの出力を統合し境界のローカライズを改善するため、完全連結CRFを上に組み込む。

提案手法

atrous畳み込みを用いた完全畳み込みネットワークで、標準のDCNNより高解像度のdense feature mapsを計算する。
パラメータを増やさずに受容野を拡大するため、標準の多層ダウンサンプリングをatrous畳み込みに置換する。
異なるレートの並列atrous畳み込みとしてASPPを実装し、マルチスケールのコンテキストを捉える。
最終DCNNのスコアマップを双一次補間で元の画像サイズへアップサンプリングし、その後境界を鋭敏化するため完全連結CRFで refine。
セマンティブセグメンテーションのためにImagenet-pretrained networks（VGG-16またはResNet-101）をファインチューニングし、CRFパラメータは検証セットで別途訓練。
Caffeフレームワークを拡張する公開コードとモデルを提供。

実験結果

リサーチクエスチョン

RQ1atrous畳み込みは追加のパラメータや計算を増やすことなく高解像度のdense予測を可能にするか。
RQ2atrous Spatial Pyramid Poolingはマルチスケールの対象を効率的に捉え、セグメンテーションを改善するか。
RQ3DCNNの出力と完全連結CRFを組み合わせることで、境界のローカライズと全体的なセグメンテーション精度が改善されるか。
RQ4より深いネットワーク（VGG-16対ResNet-101）をatrous畳み込みとASPPを用いた場合、セマンティックセグメンテーションの性能にどう影響するか。

主な発見

Kernel	Rate	FOV	Params	Speed (images/sec)	bef/aft CRF
7x7	4	224	134.3M	1.44	64.38 / 67.64
4x4	4	128	65.1M	2.90	59.80 / 63.74
4x4	8	224	65.1M	2.90	63.41 / 67.14
3x3	12	224	20.5M	4.84	62.25 / 67.64

atrous畳み込みを用いたDeepLabは、元の画像サイズへの双線形アップサンプリングにより高解像度の特徴マップ（8xサンプリング）を実現。
ASPPは異なるサンプリングレートで特徴を探索することでマルチスケールオブジェクトのセグメンテーションを改善。
完全連結CRFは境界を refine しローカライズを改善し、バリアント間でmean IOUの絶対値で約3–5%の Gainsをもたらす。
PASCAL VOC 2012では、DeepLab-CRF-LargeFOVが公式のテストセットで70.3%のmean IOUを達成し、VOC 2012、PASCAL-Context、PASCAL-Person-Part、Cityscapesで最先端の結果を報告。
より大きな視野（Small kernel with high rate）とCRFを組み合わせると、精度と速度のトレードオフが最適化される（例：DeepLab-LargeFOV）。
ASPPとCRFを組み合わせたDeepLabの変種は、会議版の結果を上回り、より深いネットワークとマルチスケール処理で gainsを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。