QUICK REVIEW

[論文レビュー] DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation

Gen Li, Inyoung Yun|arXiv (Cornell University)|Jul 26, 2019

Advanced Neural Network Applications参考文献 33被引用数 178

ひとこと要約

DABNetはDepth-wise Asymmetric Bottleneckモジュールを導入し、軽量ネットワークを構築。高精度と非常に低パラメータ数でリアルタイムセマンティックセグメンテーションを実現。例：Cityscapesテストで70.1% mIoU、パラメータ0.76M、GTX 1080Tiで27.7 FPS。

ABSTRACT

As a pixel-level prediction task, semantic segmentation needs large computational cost with enormous parameters to obtain high performance. Recently, due to the increasing demand for autonomous systems and robots, it is significant to make a tradeoff between accuracy and inference speed. In this paper, we propose a novel Depthwise Asymmetric Bottleneck (DAB) module to address this dilemma, which efficiently adopts depth-wise asymmetric convolution and dilated convolution to build a bottleneck structure. Based on the DAB module, we design a Depth-wise Asymmetric Bottleneck Network (DABNet) especially for real-time semantic segmentation, which creates sufficient receptive field and densely utilizes the contextual information. Experiments on Cityscapes and CamVid datasets demonstrate that the proposed DABNet achieves a balance between speed and precision. Specifically, without any pretrained model and postprocessing, it achieves 70.1% Mean IoU on the Cityscapes test dataset with only 0.76 million parameters and a speed of 104 FPS on a single GTX 1080Ti card.

研究の動機と目的

リアルタイムアプリケーションに適した高速で低パラメータのセマンティックセグメンテーションモデルを開発する。
局所情報と文脈情報を捉えるため、Depth-wise Asymmetricと拡張畳み込みを組み合わせたボトルネックを設計する。
事前学習や後処理なしでCityscapesおよびCamVidデータセット上でDABNetを評価する。
少数のパラメータを持つ浅いネットワークでも競争力のある精度を達成できることを示す。

提案手法

Depth-wise Asymmetric Bottleneck (DAB)モジュールを導入し、depth-wise asymmetric畳み込みとdilated畳み込みを組み合わせる。
局所情報（3x3 depth-wise asymmetric conv）と文脈情報（depth-wise asymmetric dilated conv）を別々に抽出する2分岐のネックを使用する。
1x1畳み込みで分岐を統合し、前活性化としてBatchNormとPReLUを適用する。最終の1x1層の後に非線形性は適用しない。
DABNetアーキテクチャを3つのダウンサンプリングブロックで構成し、1/8解像度の特徴マップと特徴再利用のための長距離ショートカットを生成する。
速度を保つためデコーダを避け、事前学習や後処理なしでエンドツーエンドに訓練する。
受容野と速度のバランスを取るため、DABブロック間の膨張率を実験する。

実験結果

リサーチクエスチョン

RQ1Depth-wise asymmetric bottleneck (DAB)はパラメータ数を大幅に減らしても競争力のあるセマンティックセグメンテーション精度を提供できるか？
RQ2深さ方向非対称性と膨張畳み込みはリアルタイム設定で精度と推論速度にどのような影響を与えるか？
RQ3エンコーダーのみでデコーダなしのネットワークはCityscapes/CamVidでの速度と精度のトレードオフで最先端か？
RQ4膨張率と文脈モジュールの欠如は高解像度のセグメンテーション性能にどう影響するか？

主な発見

モデル	事前学習	入力サイズ	mIoU（％）	FPS	GPU	パラメータ数（M）
DABNet (Ours)	No	1024x2048	70.1	27.7	1080Ti	0.76

DABNetはCityscapesテストで70.1%のmIoUを0.76Mパラメータで、1枚のGTX 1080Tiで27.7 FPSを達成。
512x1024入力で104 FPSを実行しつつ、パラメータ数は非常に小さい(0.76M)。
デコーダや重い文脈モジュール（例：SPP）は精度を改善せず、推論を大幅に遅くすることが多い。
深さ方向の膨張畳み込みは速度の利点を保つ一方、標準畳み込みへの膨張適用はFPSを顕著に低下させる。
DABNetはCityscapesおよびCamVidのベンチマークで、同等の精度を達成しつつパラメータ数を大幅に少なく、いくつかのリアルタイム手法を上回る。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。