QUICK REVIEW

[논문 리뷰] Oriented Object Detection with Transformer

Teli Ma, Mingyuan Mao|arXiv (Cornell University)|2021. 06. 06.

Advanced Neural Network Applications인용 수 31

한 줄 요약

본 논문은 임의 방향 객체를 위한 종단 간 Transformer 기반 탐지기인 O2DETR을 제안한다. 인코더에서 depthwise separable convolutions를 사용해 self-attention을 대체하고, DOTA 데이터셋에서 경쟁력 있는 mAP를 달성하며, 간단한 파인튜닝 헤드의 이점으로 성능이 향상된다.

ABSTRACT

Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN. However, the potential of DETR remains largely unexplored for the more challenging task of arbitrary-oriented object detection problem. We provide the first attempt and implement Oriented Object DEtection with TRansformer ($\bf O^2DETR$) based on an end-to-end network. The contributions of $ m O^2DETR$ include: 1) we provide a new insight into oriented object detection, by applying Transformer to directly and efficiently localize objects without a tedious process of rotated anchors as in conventional detectors; 2) we design a simple but highly efficient encoder for Transformer by replacing the attention mechanism with depthwise separable convolution, which can significantly reduce the memory and computational cost of using multi-scale features in the original Transformer; 3) our $ m O^2DETR$ can be another new benchmark in the field of oriented object detection, which achieves up to 3.85 mAP improvement over Faster R-CNN and RetinaNet. We simply fine-tune the head mounted on $ m O^2DETR$ in a cascaded architecture and achieve a competitive performance over SOTA in the DOTA dataset.

연구 동기 및 목표

회전된 앵커나 후처리 보정 없이도 방향성 객체 탐지를 동기화한다.
회전된 바운딩 박스와 각도 예측을 위한 엔드투엔드 Transformer 탐지기를 제안한다.
인코더에서 어텐션을 depthwise separable convolutions로 대체해 효율성을 높인다.
DOTA 데이터셋에서의 경쟁력 있는 성능을 보여주고 결과 향상을 위한 미세 보정 헤드를 탐구한다.

제안 방법

객체 질의에 각도 차원을 추가하여 oriented 박스를 다루는 DETR 확장.
메모리 및 연산을 줄이기 위해 Transformer encoder self-attention을 depthwise separable convolutions로 대체.
다중 스케일 특징 맵과 객체 질의와 인코더 메모리 간의 교차 어텐션을 도입하여 (x, y, w, h, α)를 예측한다.
탐지 헤드에서 3-layer MLP와 선형 계층을 사용해 (x_c, y_c, w, h, α)와 클래스 점수를 출력한다.
선택적으로 ROI-aligned 특징에서 O2DETR 예측을 제안으로 사용해 헤드를 미세 보정하여 최종 바운딩 박스와 신뢰도를 개선한다.

실험 결과

연구 질문

RQ1Transformer 기반 탐지기를 회전된 앵커 없이 임의 방향 객체 탐지에 직접 적용할 수 있는가?
RQ2encoder에서 self-attention을 depthwise separable convolutions로 대체하면 밀집하고 작은 방향 객체에서 정확도 유지 또는 향상과 함께 효율성이 개선되는가?
RQ3다중 스케일 특징 통합이 Transformer 프레임워크에서 방향성 객체 탐지 성능에 어떤 영향을 미치는가?
RQ4ROIAlign을 사용한 경량의 미세 조정 헤드가 O2DETR을 지역 제안 네트워크로 사용할 때 탐지 정확도를 더 높일 수 있는가?

주요 결과

O2DETR은 개선 없이도 DOTA의 여러 회전 탐지기보다 더 높은 mAP를 달성하며, Faster R-CNN 및 RetinaNet 벤치마크 대비 최대 3.85 mAP의 향상을 보인다.
Depthwise separable convolutions(DSConv)를 사용하는 인코더가 밀집하고 작은 객체 시나리오에서 self-attention보다 더 나은 성능을 보인다(예: ResNet-50의 DSConv 66.10 mAP vs Attn 65.33 mAP).
ROIAlign 기반 특징으로 O2DETR 헤드를 미세 조정하면 상당한 이득을 얻는다(예: ResNet-50에서 단일 스케일 입력 시 74.47 mAP, 다중 스케일에서 79.66 mAP).
다중 스케일 특징과 각도 인식 객체 질의를 사용하는 O2DETR은 여러 카테고리에서 DOTA 데이터셋의 최신 방법과 경쟁력 있는 결과를 제공한다.
Recall 분석에 따르면 O2DETR의 제안은 IoU 임계값에 대해 전통적인 RPN보다 더 높은 recall을 보이며 강력한 지역 제안 백본으로서의 사용을 뒷받침한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.