QUICK REVIEW

[論文レビュー] Don't let the information slip away

Taozhe Li, Guansu Wang|arXiv (Cornell University)|Feb 26, 2026

Advanced Neural Network Applications被引用数 0

ひとこと要約

本論文は背景コンテクストを活用する背景注意機構とAssociation Encoderを統合したAssociation DETRを提案し、DETRベースの物体検 detectorの性能を向上させ、55.7 mAPでSOTAのCOCO vali2017性能を実現しつつリアルタイム速度を維持する。

ABSTRACT

Real-time object detection has advanced rapidly in recent years. The YOLO series of detectors is among the most well-known CNN-based object detection models and cannot be overlooked. The latest version, YOLOv26, was recently released, while YOLOv12 achieved state-of-the-art (SOTA) performance with 55.2 mAP on the COCO val2017 dataset. Meanwhile, transformer-based object detection models, also known as DEtection TRansformer (DETR), have demonstrated impressive performance. RT-DETR is an outstanding model that outperformed the YOLO series in both speed and accuracy when it was released. Its successor, RT-DETRv2, achieved 53.4 mAP on the COCO val2017 dataset. However, despite their remarkable performance, all these models let information to slip away. They primarily focus on the features of foreground objects while neglecting the contextual information provided by the background. We believe that background information can significantly aid object detection tasks. For example, cars are more likely to appear on roads rather than in offices, while wild animals are more likely to be found in forests or remote areas rather than on busy streets. To address this gap, we propose an object detection model called Association DETR, which achieves state-of-the-art results compared to other object detection models on the COCO val2017 dataset.

研究の動機と目的

YOLOやDETR系の派生法のような前景中心の手法を超えて、背景情報が物体検出にどの程度寄与できるかを評価する。
背景コンテキストを捉える軽量モジュールを設計し、DETRベースモデルと統合する。
背景情報の取り込みがCOCO vali2017の検出指標を、推論の遅延を最小限に抑えつつ改善することを示す。
既存のDETRモデルへ挿入可能な再利用可能なAssociation Encoderを提供し、性能向上を図る。

提案手法

背景情報を抽出するBackground Attention Module (BAM) を提案。RFCBAMConvブロック（Receptive-Field AttentionとCBAM由来）を用いる。
背景ヒントを結合・効率的に結びつけるAssociation Module (AM) を開発。ConvFFNとWindow Attentionを用いて効率化。
DETR系アーキテクチャへ挿入可能な軽量のAssociation Encoder (AE, 約3Mパラメータ) を作成し、BAMとAMをバックボーン特徴と統合。
S1, S2, S3のマルチレベルバックボーン特徴に対してHybrd Encoderを用い、特徴を豊富化し、最終的にクエリベースのデコーダで検出。
統合前にBAMの特化を目的としてStanford Background DatasetでBAMを事前学習。

Figure 1 : Association DETR Overview. The input image is first fed into the backbone network. The shallowest image feature is denoted as $S_{1}$ , the second shallowest as $S_{2}$ , and the deepest as $S_{3}$ . we feed the shallowest feature $S_{1}$ into the Background Attention Module, which is des

実験結果

リサーチクエスチョン

RQ1背景情報をDETRベースのアーキテクチャへ統合することで検出性能は向上するのか。
RQ2背景ヒントを活用する軽量プラグインモジュール（AE）だけでどれくらいの性能向上が得られるのか。
RQ3R34やR50などのバックボーンにBAMとAMを追加した場合、推論速度にどのような影響があるのか。
RQ4Association EncoderはRT-DETR、RT-DETRv2、Deformable DETR、DETRなど、異なるDETR系に再利用可能か。

主な発見

モデル	バックボーン	エポック数	パラメータ数（M）	FPS bs=1	AP val	AP val50
YOLOv10-M	-	300	15.4	210	51.1	68.1
YOLOv10-L	-	300	24.4	137	53.2	70.1
YOLOv10-X	-	300	29.5	93	54.4	71.6
YOLOv11-M	-	300	20.1	212	51.5	67.9
YOLOv11-L	-	300	25.3	161	53.4	70.1
YOLOv12-M	-	300	20.2	206	52.5	69.5
YOLOv12-L	-	300	26.4	148	53.7	70.6
RT-DETR R34	R34	72	31	161	48.9	66.8
RT-DETRv2-M R34	R34	72	31	161	49.9	67.5
Association-DETR-R34 (Ours)	R34	72	34.1	153	54.6	71.6
DETR-DC5 R50	R50	500	41	-	43.3	63.1
DETR-DC5 R101	R101	500	60	-	44.9	64.7
Deformable-DETR R50	R50	108	44	-	45.1	65.4
RT-DETR R50	R50	72	42	108	53.1	71.3
RT-DETRv2-M	R34	72	31	161	52.4	70.2
RT-DETRv2-L	R50	72	42	145	53.7	71.6
DETR-R50 (base)	R50	500	41	-	43.3	63.1
DETR-R101 (base)	R101	500	60	-	44.9	64.7
AE + DETR (R50)	R50	500	63.1	-	46.0	66.7
AE + Deformable DETR (R50)	R50	108	47.1	-	47.7	68.6
Association-DETR-R50 (Ours)	R50	72	45.1	104	55.7	74.0

R34バックボーンのAssociation DETRは valで54.60 AP、AP50で71.6、153 FPSを記録し、同規模モデルを上回る。
R50バックボーンのAssociation DETRは valで55.7 AP、AP50で74.0、104 FPSを達成し、同等規模のYOLOおよびDETR派生より優れる。
軽量なAssociation Encoder (AE) の導入により、基準モデル全体で顕著なAP向上を達成（例：RT-DETR-R34 +5.7 AP val、+4.8 AP50）。
BAMとAMはパラメータ数と速度コストを抑えつつ有意な改善をもたらす（AM 約0.7M、BAM 約2.4M、AE 合計約3M）。
アブレーション研究では、BAMはAMよりモデルあたりのゲインが大きく、BAM+AMの組み合わせが試験構成の中で最良の性能を発揮する。

Figure 2 : Background Attention Module & Single RFCBAMConv Block.On the left side is the structure of the Background Attention Module, and on the right side are the details of a single RFCBAMConv Block, which is located within the Background Association Module.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。