QUICK REVIEW

[논문 리뷰] Rank-DETR for High Quality Object Detection

Yifan Pu, Weicong Liang|arXiv (Cornell University)|2023. 10. 13.

Advanced Neural Network Applications인용 수 25

한 줄 요약

Rank-DETR은 DETR 기반 검출기의 고-IoU 정확도 향상을 위한 순위 지향 아키텍처, 손실, 일치 비용을 도입하여, 최근 방법들에 비해 백본 전반에서 COCO의 AP 및 AP75를 향상시킨다.

ABSTRACT

Modern detection transformers (DETRs) use a set of object queries to predict a list of bounding boxes, sort them by their classification confidence scores, and select the top-ranked predictions as the final detection results for the given input image. A highly performant object detector requires accurate ranking for the bounding box predictions. For DETR-based detectors, the top-ranked bounding boxes suffer from less accurate localization quality due to the misalignment between classification scores and localization accuracy, thus impeding the construction of high-quality detectors. In this work, we introduce a simple and highly performant DETR-based object detector by proposing a series of rank-oriented designs, combinedly called Rank-DETR. Our key contributions include: (i) a rank-oriented architecture design that can prompt positive predictions and suppress the negative ones to ensure lower false positive rates, as well as (ii) a rank-oriented loss function and matching cost design that prioritizes predictions of more accurate localization accuracy during ranking to boost the AP under high IoU thresholds. We apply our method to improve the recent SOTA methods (e.g., H-DETR and DINO-DETR) and report strong COCO object detection results when using different backbones such as ResNet-$50$, Swin-T, and Swin-L, demonstrating the effectiveness of our approach. Code is available at \url{https://github.com/LeapLabTHU/Rank-DETR}.

연구 동기 및 목표

고 IoU 임계값에서 더 높은 위치화 품질을 달성하기 위해 DETR 기반 검출기의 개선을 촉진한다.
학습 및 추론 중에 실제 양성(True positives)을 촉진하고 거짓 양성/거짓 음성을 억제하는 순위 인식 구성요소를 개발한다.
디코딩 전 과정에서 순위 정보를 활용하기 위한 순위 적응형 분류 헤드와 쿼리 순위 레이어를 설계한다.
로컬라이제이션 정확도에 따른 순위를 강화하기 위해 GIoU-인식 분류 손실과 고차원 매칭 비용을 도입한다.
다양한 백본 간의 이점과 최첨단 DETR 기반 모델(H-DETR, DINO-DETR 등)과의 호환성을 입증한다.

제안 방법

각 디코더 계층 후 분류 점수에 학습 가능한 로짓 바이어스 벡터를 추가하는 순위 적응형 분류 헤드를 제안한다.
마지막 L-1 Transformer 디코더 계층 앞에 쿼리 순위를 재생성하는 순위 인식 콘텐츠 및 위치 쿼리를 생성하는 쿼리 순위 레이어를 추가한다.
현재 순위에 따라 정렬된 입력과 융합 메커니즘을 통해 순위 인식 콘텐츠 쿼리와 순위 인식 위치 쿼리를 도입한다.
정규화된 GIoU 타깃으로 분류 예측을 감독하기 위해 GIoU 인식 분류 손실을 사용한다.
정확한 로컬라이제이션의 우선순위를 두기 위해 높은 IoU(IoU^α) 값을 갖는 예측을 강조하는 고차 매칭 비용을 도입한다.
ResNet-50, Swin-T, Swin-L 등과 같은 백본에서 H-DETR 및 DINO-DETR에 대한 호환성과 개선을 입증한다.
코드 제공: https://github.com/LeapLabTHU/Rank-DETR

실험 결과

연구 질문

RQ1순위 정보를 DETR 스타일 디코딩 전 과정에 어떻게 통합하여 고 IoU 경계 상자 품질을 향상시킬 수 있는가?
RQ2DET 기반 검출기에서 분류 점수를 로컬라이제이션 정확도와 가장 잘 일치시키는 건축적 및 최적화 변경은 무엇인가?
RQ3순위 지향 구성 요소가 다양한 DETR 변형(H-DETR, DINO-DETR 등) 및 백본에 걸쳐 일반화되는가?
RQ40.75를 초과하는 IoU 임계값에서 AP에 미치는 순위 인식 손실과 고차 매칭 비용의 영향은 무엇인가?
RQ5순위 인식 설계가 DETR 기반 검출기의 거짓 양성/거짓 음성을 줄일 수 있는가?

주요 결과

Rank-DETR은 백본과 에폭에 걸쳐 강력한 DETR 기반 기준선(H-DETR 등)보다 AP를 향상시킨다.
R50 백본으로 12 에폭에서 COCO val에서 Rank-DETR은 표기된 지표들에 대해 AP 50.2, AP 75 55.0, AP 64.0를 달성한다.
Rank-DETR은 AP75에서 주목할 만한 향상을 보이며(백본에 따라 기준선 대비 +2.1%에서 +2.7% 수준).
이 방법은 더 짧은 학습 스케줄에서도 경쟁력 있는 AP를 보여준다(예: R50에서 12 에폭의 50.2 AP).
순위 지향형 아키텍처와 손실 구성 요소를 추가할 때 누적 이득이 나타났으며, 전체 구성에서 가장 좋은 결과가 나왔다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.