QUICK REVIEW

[論文レビュー] Efficient DETR: Improving End-to-End Object Detector with Dense Prior

Zhuyu Yao, Jiangbo Ai|arXiv (Cornell University)|Apr 3, 2021

Advanced Neural Network Applications参考文献 43被引用数 157

ひとこと要約

Efficient DETR はオブジェクトコンテナを初期化するための密な priors を導入し、6-decoder DETR に対して競合的に動作する 1-decoder のエンドツーエンド検出器を実現し、収束が速いことを示す。COCO と CrowdHuman での実証。

ABSTRACT

The recently proposed end-to-end transformer detectors, such as DETR and Deformable DETR, have a cascade structure of stacking 6 decoder layers to update object queries iteratively, without which their performance degrades seriously. In this paper, we investigate that the random initialization of object containers, which include object queries and reference points, is mainly responsible for the requirement of multiple iterations. Based on our findings, we propose Efficient DETR, a simple and efficient pipeline for end-to-end object detection. By taking advantage of both dense detection and sparse set detection, Efficient DETR leverages dense prior to initialize the object containers and brings the gap of the 1-decoder structure and 6-decoder structure. Experiments conducted on MS COCO show that our method, with only 3 encoder layers and 1 decoder layer, achieves competitive performance with state-of-the-art object detection methods. Efficient DETR is also robust in crowded scenes. It outperforms modern detectors on CrowdHuman dataset by a large margin.

研究の動機と目的

DETR-style detectors がなぜ複数回のデコーダ反復を必要とするのかを調査する。
オブジェクトコンテナ（queries と reference points）の初期化が性能に与える影響を探る。
密-疎ハイブリッド DETR（Efficient DETR）を提案し、密な priors を用いてエンドツーエンド検出と収束を改善。
COCO および CrowdHuman データセットで手法を実証し、最先端検出器と比較する。

提案手法

DETR の性能におけるデコーダ層と補助損失の役割を分析する。
領域提案からの密な priors を含む、reference points および object queries を介したオブジェクトコンテナの初期化を研究する。
密集分岐と疎分岐を備え、共通の検出ヘッドを共有し、可形変注意機構を用いた Efficient DETR を提案する。
トップ-K の密な提案を用いて reference points と object queries を初期化し、1-decoder のリファインメント段階を有効にする。
Hungarian 一対一割り当てと密部と疎部にまたがる統一損失で訓練し、訓練中に提案数を線形に減少させる。

実験結果

リサーチクエスチョン

RQ1オブジェクトコンテナ（queries および reference points）の初期化は、エンドツーエンド DETR モデルの収束と精度にどのように影響しますか？
RQ2領域提案からの密な priors を取り入れることで、カスケードデコーダの反復回数を減らし、1-decoder と 6-decoder のアーキテクチャ間のギャップを縮められますか？
RQ3密-疎の二分岐設計（Efficient DETR）が COCO や CrowdHuman のような混雑した場面に与える影響は何ですか？

主な発見

デコーダ補助損失とカスケードリファインメントは DETR の性能の鍵であり、デコーダ層を減らすと naive な設定で AP が大幅に劣化する。
領域提案と密な特徴を介した密な priors 初期化は、1-decoder の性能を大幅に改善し、6-decoder の結果に近づける。
Efficient DETR は COCO で 44.2 AP を 3 encoders and 1 decoder、36-epoch の訓練で達成し、Faster R-CNN や多くのエンドツーエンド検出器を上回りつつ、パラメータ数が少ない。
CrowdHuman のような混雑した場面でも Efficient DETR は堅牢で、100 proposals で競争力のある AP と強い一般化を実現する。提案数を増やすと一部設定で収益が頭打ちとなる。
訓練中の提案数を線形に減らす戦略は、学習を安定させ、 fewer proposals でも高い精度を維持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。