QUICK REVIEW

[論文レビュー] Beyond Hungarian: Match-Free Supervision for End-to-End Object Detection

Shoumeng Qiu, Xinrun Li|arXiv (Cornell University)|Mar 9, 2026

Advanced Neural Network Applications被引用数 0

ひとこと要約

Cross-AttentionベースのQuery選択（CAQS）モジュールを用いたDETRのマッチなし学習パラダイムを提案し、暗黙のクエリ-対象対応を学習。トレーニング速度の大幅な向上とCOCO性能の改善を実現、特に大きなオブジェクトで顕著。

ABSTRACT

Recent DEtection TRansformer (DETR) based frameworks have achieved remarkable success in end-to-end object detection. However, the reliance on the Hungarian algorithm for bipartite matching between queries and ground truths introduces computational overhead and complicates the training dynamics. In this paper, we propose a novel matching-free training scheme for DETR-based detectors that eliminates the need for explicit heuristic matching. At the core of our approach is a dedicated Cross-Attention-based Query Selection (CAQS) module. Instead of discrete assignment, we utilize encoded ground-truth information to probe the decoder queries through a cross-attention mechanism. By minimizing the weighted error between the queried results and the ground truths, the model autonomously learns the implicit correspondences between object queries and specific targets. This learned relationship further provides supervision signals for the learning of queries. Experimental results demonstrate that our proposed method bypasses the traditional matching process, significantly enhancing training efficiency, reducing the matching latency by over 50\%, effectively eliminating the discrete matching bottleneck through differentiable correspondence learning, and also achieving superior performance compared to existing state-of-the-art methods.

研究の動機と目的

DETRベースの検出器における高価なHungarian二部対照マッチングを除去する動機づけ。
クエリ-対象対応を学習する微分可能でエンドツーエンドの監督機構を開発。
GT-プローブモジュールとスパース対応生成を設計し、密-疎の監督を提供。
マッチなし学習がトレーニング効率とCOCOにおける検出精度を改善することを示す。

提案手法

GT-Probeモジュール（GTPM）を導入し、グラウンドトゥルースと予測クエリをエンコードし、クロスアテンションを用いてGTとクエリ間の密な対応行列Aを生成。
SCG（Sparse Correspondence Generation）を適用し、Aを安定な監督付けのための sparseで正規化された割り当て行列 Âへ変換。
分類項と幾何学的項からBroadcastコスト行列Cを構築し、すべてのGT-クエリ対の監督シグナルを定量化。
対応重み損失LwをAとCの要素ごとの積として定義し、GT-プローブの学習を誘導。
Sparse Query Loss LqをCとÂでゲートして、監督を特定のクエリサブセットに集中。
総損失Ltotal = α Lw + β Lqで訓練し、割り当て学習とクエリの洗練のバランスを取る。

実験結果

リサーチクエスチョン

RQ1Hungarianマッチングを明示的に使わずにDETRベース検出器を訓練して、精度を維持・向上できるか。
RQ2微分可能なGT駆動の対応学習は、クエリバンクが大きい場合の学習効率とスケーラビリティを改善するか。
RQ3提案されたCAQSベースの監督が、特に大きい物体の検出性能にどう影響するか。
RQ4SCGにおけるスパース化と正規化戦略が局在化と全体のAPに与える影響は。

主な発見

Detector	Setting	Epoch	AP	AP50	AP75	APS	APM	APL
Deformable DETR	Baseline	20	25.4	43.4	26.3	11.2	28.5	37.1
Ours	20 epochs (ours)	20	26.1	43.5	27.1	10.7	29.1	41.3

提案手法は20エポック時点で26.1 AP、ベースライン（Deformable DETR）は25.4 AP、+0.7 AP改善。
AP75はマッチフリー手法で27.1に向上、ベースラインより+0.8上昇。
AP Lは37.1から41.3へ上昇、+4.2の大きな改善で大物体での性能向上を示唆。
トレーニングレイテンシはHungarianベースラインの53 msからマッチフリーの25 msへ低下し、全トレーニングサイクルで50%以上の高速化。
α値の変動に対して堅牢なAP L利得を示し、最適はα = 1で、正規化されたスパーシティ（総和を1に正規化）が26.1 APを生む。
小さい物体のAP Sはわずかに低下（10.7 vs 11.2）し、極小ターゲットへの課題を示唆。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。