QUICK REVIEW

[논문 리뷰] Efficient DETR: Improving End-to-End Object Detector with Dense Prior

Zhuyu Yao, Jiangbo Ai|arXiv (Cornell University)|2021. 04. 03.

Advanced Neural Network Applications참고 문헌 43인용 수 157

한 줄 요약

Efficient DETR는 객체 컨테이너를 초기화하기 위한 dense prior를 도입하여 1-디코더 엔드투엔드 DETECTOR를 가능하게 하며, 6-디코더 DETR와 경쟁하는 성능을 보이면서도 더 빠르게 수렴합니다. COCO와 CrowdHuman에서 시연되었습니다.

ABSTRACT

The recently proposed end-to-end transformer detectors, such as DETR and Deformable DETR, have a cascade structure of stacking 6 decoder layers to update object queries iteratively, without which their performance degrades seriously. In this paper, we investigate that the random initialization of object containers, which include object queries and reference points, is mainly responsible for the requirement of multiple iterations. Based on our findings, we propose Efficient DETR, a simple and efficient pipeline for end-to-end object detection. By taking advantage of both dense detection and sparse set detection, Efficient DETR leverages dense prior to initialize the object containers and brings the gap of the 1-decoder structure and 6-decoder structure. Experiments conducted on MS COCO show that our method, with only 3 encoder layers and 1 decoder layer, achieves competitive performance with state-of-the-art object detection methods. Efficient DETR is also robust in crowded scenes. It outperforms modern detectors on CrowdHuman dataset by a large margin.

연구 동기 및 목표

DETR 스타일 검출기가 왜 다중 디코더 반복이 필요한지 조사한다.
객체 컨테이너(쿼리 및 참조점)의 초기화가 성능에 미치는 영향을 탐구한다.
dense priors를 사용해 엔드투엔드 DETR의 성능과 수렴 속도를 개선하는 dense-sparse 하이브리드 DETR(Efficient DETR)을 제안한다.
COCO 및 CrowdHuman 데이터셋에서 접근 방식을 시연하고 최신 검출기와 비교한다.

제안 방법

DETR 성능에서 디코더 층과 보조 손실의 역할을 분석한다.
dense priors의 region proposal에서의 초기화와 객체 쿼리의 초기화를 연구한다.
공통 탐지 헤드를 공유하고 deformable attention을 사용하는 dense-sparse 가지를 가진 Efficient DETR를 도입한다.
상위-K dense proposal을 사용해 참조점과 객체 쿼리를 초기화하여 1-디코더 정제 단계를 가능하게 한다.
Hungarian 일대일 매칭과 dense/sparse 부분에 대한 통합 손실로 학습하고, 학습 중 제안(proposals)의 선형 감소를 사용한다.

실험 결과

연구 질문

RQ1객체 컨테이너(쿼리 및 참조점)의 초기화가 엔드투엔드 DETR 모델의 수렴 속도와 정확도에 어떤 영향을 미치는가?
RQ2dense priors( region proposals에서의 priors)를 도입하면 cascade decoder 반복의 필요성을 줄이고 1-디코더와 6-디코더 아키텍처 간 간극을 줄일 수 있는가?
RQ3dense-sparse 이중 가지 설계(Efficient DETR)가 COCO 및 CrowdHuman 같은 혼잡한 장면에 어떤 영향을 미치는가?

주요 결과

디코더 보조 손실과 cascade 정제가 DETR 성능의 핵심이며, 디코더 층을 줄이면 순진한 설정에서 AP가 크게 감소한다.
region proposals 및 Dense feature를 통한 dense prior 초기화가 1-디코더 성능을 크게 향상시키며 6-디코더 결과에 근접한다.
Efficient DETR은 3 encoders와 1 decoder로 COCO에서 44.2 AP를 달성하고 36-에폭 학습으로 Faster R-CNN 및 다수의 엔드투엔드 검출기보다 우수하며 더 적은 매개변수를 사용한다.
Efficient DETR은 혼잡한 장면(CrowdHuman)에서도 견고하게 작동하며, 100개의 제안으로도 경쟁력 있는 AP와 강한 일반화 성능을 보이고, 제안의 증가가 일부 설정에서 수익이 감소한다.
학습 중 제안 수를 선형으로 감소시키는 전략은 학습의 안정성을 높이고 더 적은 제안으로도 높은 정확도를 유지한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.