QUICK REVIEW

[論文レビュー] DSOD: Learning Deeply Supervised Object Detectors from Scratch

Zhiqiang Shen, Zhuang Liu|arXiv (Cornell University)|Aug 3, 2017

Advanced Neural Network Applications参考文献 30被引用数 96

ひとこと要約

DSODはDenseNetsとSSDに触発された提案なしの密結合フレームワークのもと、ゼロから物体検出器を訓練し、より小さなモデルとリアルタイムの速度で最先端の結果を達成します。

ABSTRACT

We present Deeply Supervised Object Detector (DSOD), a framework that can learn object detectors from scratch. State-of-the-art object objectors rely heavily on the off-the-shelf networks pre-trained on large-scale classification datasets like ImageNet, which incurs learning bias due to the difference on both the loss functions and the category distributions between classification and detection tasks. Model fine-tuning for the detection task could alleviate this bias to some extent but not fundamentally. Besides, transferring pre-trained models from classification to detection between discrepant domains is even more difficult (e.g. RGB to depth images). A better solution to tackle these two critical problems is to train object detectors from scratch, which motivates our proposed DSOD. Previous efforts in this direction mostly failed due to much more complicated loss functions and limited training data in object detection. In DSOD, we contribute a set of design principles for training object detectors from scratch. One of the key findings is that deep supervision, enabled by dense layer-wise connections, plays a critical role in learning a good detector. Combining with several other principles, we develop DSOD following the single-shot detection (SSD) framework. Experiments on PASCAL VOC 2007, 2012 and MS COCO datasets demonstrate that DSOD can achieve better results than the state-of-the-art solutions with much more compact models. For instance, DSOD outperforms SSD on all three benchmarks with real-time detection speed, while requires only 1/2 parameters to SSD and 1/10 parameters to Faster RCNN. Our code and models are available at: https://github.com/szq0214/DSOD .

研究の動機と目的

事前学習済み分類モデルから生じるバイアスを避けるため、ゼロから物体検出器を訓練する動機づけ。
リソース効率が高く高精度な検出器の設計原則を提案する。
深層監督を用いた提案なし・シングルショット検出パラダイムに基づくDSODフレームワークを構築する。
小型モデルでVOC 2007、VOC 2012、MS COCOにおいてDSODが最先端の結果を達成することを示す。

提案手法

速度のため、SSDを基盤とした提案なし・シングルショット検出フレームワークを採用する。
密な層間接続による深層監督を導入し、暗黙の補助教師信号を可能にする。
生の入力からの情報損失を減らすためのstemブロックを組み込む。
各予測スケールごとに多層の特徴マップを融合する密な予測構造を使用する。
ダウンサンプリングなしで密結合ブロックを増やすため、プーリングなしのトランジションを含める。
標準的な検出ベンチマークで全ネットワークをゼロから訓練する。

実験結果

リサーチクエスチョン

RQ1事前学習済み分類モデルなしで、物体検出器を効果的にゼロから訓練できるか。
RQ2ゼロから訓練された検出器の高精度と効率性を実現するネットワーク設計原理は何か。
RQ3密で多段階の予測構造が、ゼロから訓練された検出器の精度とパラメータ効率にどう影響するか。

主な発見

DSODはImageNet pre-trainingなしで、VOC 2007、VOC 2012、およびMS COCOで競争力のある、時には優位なmAPを達成する。
Plain connections のDSOD300は07+12で訓練するとVOC 2007テストで77.3%mAPを達成; dense predictionを用いると77.7%に上がる。
COCOデータ(07+12+COCO)を用いた場合、dense predictionを用いるDSOD300はVOC 2007テストで81.7% mAPに達する。
DSODはリアルタイムの検出速度を提供（例: Titan X上で300x300で20.6 fps、プレーン構造）し、SSDやFaster R-CNNのベースラインよりはるかに少ないパラメータを使用する。
stemブロックとプーリングなしのトランジションは精度を大幅に向上させ、密な予測構造はパラメータを減らし、精度を向上させることができる。
ゼロから訓練されたDSODは、事前学習済み分類子から微調整されたモデルと互角かそれを上回ることがあり、事前学習なしの検出のためのアーキテクチャ設計の価値を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。