QUICK REVIEW

[論文レビュー] Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation

Chenxi Liu, Liang-Chieh Chen|arXiv (Cornell University)|Jan 10, 2019

Advanced Neural Network Applications参考文献 93被引用数 149

ひとこと要約

Auto-DeepLab は階層的なニューラルアーキテクチャ探索を導入し、セマンティックセグメンテーションのためにネットワークレベルとセルレベルの構造を共同で最適化する。ImageNet pretraining なしで強力な結果を達成し、探索を効率的に行う（~3 GPU days）。

ABSTRACT

Recently, Neural Architecture Search (NAS) has successfully identified neural network architectures that exceed human designed ones on large-scale image classification. In this paper, we study NAS for semantic image segmentation. Existing works often focus on searching the repeatable cell structure, while hand-designing the outer network structure that controls the spatial resolution changes. This choice simplifies the search space, but becomes increasingly problematic for dense image prediction which exhibits a lot more network level architectural variations. Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space. We present a network level search space that includes many popular designs, and develop a formulation that allows efficient gradient-based architecture search (3 P100 GPU days on Cityscapes images). We demonstrate the effectiveness of the proposed method on the challenging Cityscapes, PASCAL VOC 2012, and ADE20K datasets. Auto-DeepLab, our architecture searched specifically for semantic image segmentation, attains state-of-the-art performance without any ImageNet pretraining.

研究の動機と目的

ニューラルアーキテクチャ探索を画像分類から密なセマンティックセグメンテーションへ拡張する。
ネットワークレベルとセルレベルのアーキテクチャを含む2レベルの階層的探索空間を提案する。
階層を効率的に探索するための微分可能で勾配ベースの NAS フレームワークを開発する。
ImageNet pretraining なしで強力なセグメンテーション性能を示し、最先端のベースラインと比較する。

提案手法

空間解像度の変化を表すネットワークレベルのトレリスと、層の操作を記述するセルレベルの DAG を含む2レベルの階層的探索空間を定義する。
セルレベルの操作を表す alpha と、ネットワークレベルの遷移を表す beta を用いたアーキテクチャの連続的な微分可能緩和を使用する。
split training data (trainA/trainB) に対して勾配に基づく更新を用いて、アーキテクチャパラメータとネットワーク重みを交互に最適化する。
セルの離散アーキテクチャを貪欲デコード（上位の前駆体と argmax 演算子）でデコードし、ネットワーク経路には Viterbi デコードを適用する。
探索中は解像度ごとに Atrous Spatial Pyramid Pooling (ASPP) モジュールを接続し、簡略化されたマルチブランチ構成を採用する。
Cityscapes で 321x321 のクロップでゼロから訓練し、Cityscapes、PASCAL VOC 2012、ADE20K で評価する。

実験結果

リサーチクエスチョン

RQ1ニューラルアーキテクチャ探索を、セマンティックセグメンテーションのような密な画像予測タスクに効果的に拡張できるか。
RQ2ネットワークレベルとセルレベルのアーキテクチャを同時に探索することは、セルのみを探索するよりも性能が向上するか。
RQ3高解像度で密な予測タスクに対して、微分可能NASはどれだけ効率的になり得るか。
RQ4ImageNet pretraining なしでの Cityscapes、VOC 2012、ADE20K における Auto-DeepLab のバリアントの性能はどうか。

主な発見

ImageNet pretraining なしの場合、Auto-DeepLab-L は Cityscapes テストセットで FRRN-B を 8.6%、GridNet を 10.9% 上回る。
Auto-DeepLab は pretraining 付きの DeepLabv3+ の性能に匹敵し、Multi-Adds では 2.23 倍高速。
軽量な Auto-DeepLab-S はパラメータ数が大幅に少ない (10.15M) で Cityscapes テストで 80.9% を達成し、333.25B Multi-Adds。
Auto-DeepLab-L は coarse アノテーションで Cityscapes テスト 82.1%、DeepLabv3+ より Multi-Adds を 55.2%削減; pretraining なしでは、最良モデルが Cityscapes のいくつかのベースラインを上回る。
PASCAL VOC 2012 および ADE20K では、最良の Auto-DeepLab バリアントは限定的な pretraining で訓練された複数の最新モデルを上回る。VOC テストは ImageNet/COCO pretraining バリアントで最大 85.6% mIOU に達する。
提案された微分可能な二レベル NAS は従来の dense-prediction NAS 手法（例: DPC）より約 1000 倍速く、多くのデータセットに汎化するアーキテクチャを見つける。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。