QUICK REVIEW

[论文解读] detrex: Benchmarking Detection Transformers

Tianhe Ren, Shilong Liu|arXiv (Cornell University)|Jun 12, 2023

Advanced Neural Network Applications被引用 15

一句话总结

detrex 提供一个模块化、轻量级的框架和 DETR 基模型的综合基准套件，提升可复现性并实现检测、分割和姿态估计任务之间的公平比较。

ABSTRACT

The DEtection TRansformer (DETR) algorithm has received considerable attention in the research community and is gradually emerging as a mainstream approach for object detection and other perception tasks. However, the current field lacks a unified and comprehensive benchmark specifically tailored for DETR-based models. To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation. We conduct extensive experiments under detrex and perform a comprehensive benchmark for DETR-based models. Moreover, we enhance the performance of detection transformers through the refinement of training hyper-parameters, providing strong baselines for supported algorithms.We hope that detrex could offer research communities a standardized and unified platform to evaluate and compare different DETR-based models while fostering a deeper understanding and driving advancements in DETR-based instance recognition. Our code is available at https://github.com/IDEA-Research/detrex. The project is currently being actively developed. We encourage the community to use detrex codebase for further development and contributions.

研究动机与目标

为基于 DETR 的模型提供一个统一、模块化的平台，用于开发和评估。
在标准数据集上对 DETR 基于的检测、分割和姿态估计算法进行基准评测。
通过训练和超参数优化提升可复现实验的性能。
在骨干网络和变体之间的模型性能、训练成本和推理速度方面提供公平比较。

提出的方法

模块化设计：六个核心组件（Backbone、Encoder、Query Initialization、Decoder、Matcher、Loss）及可扩展的扩展。
轻量级训练引擎和基于 LazyConfig 的配置，便于灵活试验。
在 COCO val2017 上对 DETR 变体进行全面基准测试，包括训练成本、FLOPs、FPS 和内存。
使用 DINO 作为默认检测器，对 Backbone 和模型变体进行基准测试。
消融和超参数研究以识别敏感性和性能提升。
提供多种基于 DETR 的模型（如 Deformable-DETR、DINO、H-DETR、DAB-DETR、DN-DETR）以及分割/姿态估计方法以实现可重复性。

实验结果

研究问题

RQ1统一框架如何提升 DETR 基模型间的可复现性和公平比较？
RQ2训练超参数和骨干网络对 DETR 基检测器的影响是什么？
RQ3在标准化基准下，DETR 基模型在检测、分割和姿态估计任务上的表现如何？
RQ4像 NMS 这样的后处理对 DETR 变体是否仍有益？
RQ5在统一代码库内通过仔细的参数调整能实现哪些基线改进？

主要发现

模型	#ep	AP	AP 50	AP 75	AP S	AP M	AP L	#参数	GFLOPs	FPS	内存	GPU-h
Deformable-DETR-Two-Stage	50	48.2	67.0	52.2	30.7	51.4	63.0	41.2M	175.6 ± 19.1	26.3	11.0GB	208h
Anchor-DETR	50	41.9	62.9	44.6	22.0	46.0	59.7	37.0M	92.7 ± 9.2	27.8	44.7GB	168h
Conditional-DETR	50	41.6	63.0	43.9	21.4	45.2	59.8	43.4M	89.1 ± 9.7	37.8	6.4GB	164h
DAB-DETR	50	43.3	63.9	45.9	23.4	47.1	62.1	43.7M	90.4 ± 9.7	32.9	5.0GB	214h
DN-DETR	50	44.7	65.3	47.5	23.7	48.7	64.1	43.7M	90.5 ± 9.7	32.2	5.1GB	240h
DAB-Deformable-DETR	50	49.0	67.4	53.4	31.5	52.1	64.4	47.4M	231.3 ± 25.1	23.4	10.5GB	230h
DAB-Deformable-DETR-Two-Stage	50	49.7	68.0	54.3	31.9	53.2	64.7	47.5M	235.4 ± 255	22.1	10.5GB	220h
DINO-4scale	12	49.7	67.0	54.4	31.4	52.9	63.6	47.7M	244.5 ± 25.5	24.6	10.9GB	67h
H-DETR	12	49.1	66.9	53.7	32.2	52.3	63.8	47.9M	268.1 ± 24.7	22.4	12.0GB	80h
DETA-5scale	12	50.2	67.4	55.2	32.3	54.2	65.0	48.4M	247.1 ± 25.9	15.3	10.8GB	53h
Backbone Variants (ResNet-50, Swin, ViT, ConvNeXt, InternImage, 等)	—	—	—	—	—	—	—	—	—	—	—	—

detrex 能实现对超过 15 种主流 DETR 基算法的可复现实验，并在性能上优于原始实现。
NMS 后处理在 DETR 变体上提供稳定的增益，特别是在 AP50 和 APL 上，默认阈值为 0.8。
超参数调优在若干模型上带来显著的性能提升（例如在 Deformable-DETR-Two-Stage 的调优设置下，AP 提升高达 1.3）。
在不同骨干网络中，更大的预训练骨干和更新的架构（如 Swin、FocalNet、InternImage）在基于 DETR 的检测器中带来更高的 AP。
DINO 和 DETA 在 DETR 变体中收敛速度快，而 Conditional-DETR 在推理速度快且内存占用低。
Detrex 的复现带来改进，如 Deformable-DETR（+0.4 AP）和 Deformable-DETR-Two-Stage（+1.1 AP），相较于原始实现。
分割和姿态估计方法（Mask2Former、MP-Former、MaskDINO、ED-Pose）与报道结果一致，验证 detrex 作为可靠基准。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。