QUICK REVIEW

[論文レビュー] Fast-SCNN: Fast Semantic Segmentation Network

Rudra P. K. Poudel, Stephan Liwicki|arXiv (Cornell University)|Feb 12, 2019

Advanced Neural Network Applications参考文献 26被引用数 364

ひとこと要約

Fast-SCNN は共有早期特徴抽出器を備えた高解像度画像向けのリアルタイム以上のセマンティックセグメンテーションを実現し、Cityscapesで 68.0% mIoU を 123.5 fps で 1.11M パラメータ使用、ImageNet 事前学習の効果は最小限。

ABSTRACT

The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded devices with low memory. Building on existing two-branch methods for fast segmentation, we introduce our `learning to downsample' module which computes low-level features for multiple resolution branches simultaneously. Our network combines spatial detail at high resolution with deep features extracted at lower resolution, yielding an accuracy of 68.0% mean intersection over union at 123.5 frames per second on Cityscapes. We also show that large scale pre-training is unnecessary. We thoroughly validate our metric in experiments with ImageNet pre-training and the coarse labeled data of Cityscapes. Finally, we show even faster computation with competitive results on subsampled inputs, without any network modifications.

研究の動機と目的

高解像度の画像と組み込み機器向けにリアルタイムなセマンティックセグメンテーションを動機づける。
詳細と文脈を効率的に組み合わせるため、学習してダウンサンプリングする共有早期特徴抽出器を導入する。
Depthwise separable convolutions と inverted residual blocks を用いた低容量ネットワーク（1.11M パラメータ）を設計する。
この低容量モデルにおいて ImageNet での事前学習が得られる利得は限られていることを示す。

提案手法

2つの解像度ブランチにわたって早期畳み込みを共有する learning to downsample モジュールを備えた高速セグメンテーションネットワーク（Fast-SCNN）を提案する。
残差ボトルネックブロックを用いて、縮小解像度で文脈を捉える粗いグローバル特徴抽出器を使用する。
単純な加算によって高解像度の空間的詳細と低解像度のグローバル文脈を結合する特徴融合モジュールを組み込む。
パラメータと FLOPs を削減するために depthwise separable convolution と inverted residual blocks を採用する。
小さな depthwise separable convolutions のスタックと softmax または argmax 推論オプションの可能性を持つ分類ヘッドを含める。

実験結果

リサーチクエスチョン

RQ1高メモリ要件なしで埋め込みデバイスを用いて高解像度画像に対してリアルタイムなセマンティックセグメンテーションをどのように達成できるか？
RQ2解像度ブランチ間で初期層の計算を共有する（learning to downsample）が速度を向上させつつ精度を維持するか？
RQ3軽量モデルの Cityscapes 性能に対するネットワーク容量と事前学習の影響はどの程度か？

主な発見

Fast-SCNN は 1024x2048 入力で Titan Xp (Pascal) 上、123.5 fps で Cityscapes において 68.0% mIoU を達成。
モデルは約 1.11 百万パラメータを使用し、多くのリアルタイムおよびオフライン法よりもはるかに少ない。
learning to downsample モジュールと1つのスキップ接続が、効率的なマルチ解像度特徴共有と境界の保持を可能にする。
ImageNet での事前学習や粗い Cityscapes データの追加は、この低容量ネットワークに対してわずかな利得（約0.5% mIoU）しかもたらさない。
入力解像度を下げると FPS が向上する（例：1024x2048: 123.5 fps; 512x1024: 285.8 fps; 256x512: 485.4 fps）対応する mIoU は（68.0%、62.8%、51.9%）。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。