QUICK REVIEW

[論文レビュー] DetNAS: Backbone Search for Object Detection

Yukang Chen, Tong Yang|arXiv (Cornell University)|Mar 26, 2019

Advanced Neural Network Applications参考文献 44被引用数 174

ひとこと要約

DetNASは、ワンショットの超ネットと進化的探索を用いた3段階のバックボーン探索フレームワークを用いて、物体検出器向けにバックボーンを調整し、手作りのネットワークより少ない FLOPs で COCO mmAP を高める。

ABSTRACT

Object detectors are usually equipped with backbone networks designed for image classification. It might be sub-optimal because of the gap between the tasks of image classification and object detection. In this work, we present DetNAS to use Neural Architecture Search (NAS) for the design of better backbones for object detection. It is non-trivial because detection training typically needs ImageNet pre-training while NAS systems require accuracies on the target detection task as supervisory signals. Based on the technique of one-shot supernet, which contains all possible networks in the search space, we propose a framework for backbone search on object detection. We train the supernet under the typical detector training schedule: ImageNet pre-training and detection fine-tuning. Then, the architecture search is performed on the trained supernet, using the detection task as the guidance. This framework makes NAS on backbones very efficient. In experiments, we show the effectiveness of DetNAS on various detectors, for instance, one-stage RetinaNet and the two-stage FPN. We empirically find that networks searched on object detection shows consistent superiority compared to those searched on ImageNet classification. The resulting architecture achieves superior performance than hand-crafted networks on COCO with much less FLOPs complexity.

研究の動機と目的

画像分類バックボーンではなく、物体検出専用に設計されたバックボーンの必要性を動機づける。
ワンショット超ネットを介して重み学習とアーキテクチャ探索を分離する実用的な NAS フレームワークを提案する。
物体検出で探索されたバックボーンが、検出器とデータセットを横断して ImageNet 分類で探索されたものより優れていることを示す。
DetNASNet および DetNASNet (3.8) が COCO および VOC で、計算コストを抑えつつ優れた精度を達成することを示す。

提案手法

探索空間内の全候補バックボーンを包含するワンショット超ネットを構築する。
相対的なアーキテクチャ性能を反映するパスワイズサンプリング戦略を用いて ImageNet 上で超ネットを事前学習する。
検出データセット（COCO/VOC）上で SyncBN を用いて微調整時の小さなバッチ統計量に対応しながら超ネットをファインチューニングする。
訓練済み超ネット上で、FLOPs/推論制約の下、進化アルゴリズムを用いてアーキテクチャを探索する。
評価時には各評価パスのバッチ統計を再計算して、BN 層の有効な統計を確保する。

実験結果

リサーチクエスチョン

RQ1物体検出で直接探索されたバックボーンは、ImageNet分類で探索されたバックボーンより優れていることがあるか？
RQ2ワンショット NAS フレームワークに事前学習を組み込むことで、検出器のバックボーン探索は計算的に実現可能になるか？
RQ3NAS が物体検出器（FPN、RetinaNet）とデータセット（COCO、VOC）向けに最適化されると、どのようなアーキテクチャパターンが現れるか？

主な発見

バックボーン	FLOPs	ImageNet Top1 Acc	COCO mmAP	COCO AP50	COCO AP75	COCO APs	COCO APm	COCO APl	備考
ResNet-50	3.8G	76.15	37.3	58.2	40.8	21.0	40.2	49.4	-
ResNet-101	7.6G	77.37	40.0	61.4	43.7	23.8	43.1	52.2	-
ShuffleNetv2-40	1.3G	77.18	39.2	60.8	42.4	23.6	42.3	52.2	-
ShuffleNetv2-40 (3.8)	3.8G	78.47	40.8	62.1	44.8	23.4	44.2	54.2	-
DetNASNet	1.3G	77.20	40.2	61.5	43.6	23.3	42.5	53.8	DetNAS backbone searched on detection
DetNASNet (3.8)	3.8G	78.44	42.0	63.9	45.8	24.9	45.1	56.8	DetNAS backbone enlarged for ~3.8G FLOPs

DetNASNet は COCO で 1.3G FLOPs、40.2 mmAP を達成し、同じ検出器（FPN）下の ResNet-50 を上回る。
DetNASNet (3.8) は 3.8G FLOPs で 42.0 mmAP に到達し、ResNet-50 を 4.7%、ResNet-101 を 2.0% 上回る。
同じ FLOPs（1.3G）を持つ手作りの ShuffleNetv2-40 と比較して、DetNASNet は 0.8 mmAP 高い。
検出器とデータセットを横断して、検出用に探索されたネットワークは、ImageNet分類で探索されたネットワークより一貫して 3% 超（VOC）および 1% 超（COCO）上回っている。
DetNAS フレームワークは約 44 GPU-日を要し、標準的な検出器の訓練コストの約2倍に近く、バックボーン探索を実用的にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。