QUICK REVIEW

[论文解读] Practical Insights into Semi-Supervised Object Detection Approaches

Chaoxin Wang, Bharaneeshwar Balasubramaniyam|arXiv (Cornell University)|Jan 19, 2026

Advanced Neural Network Applications被引用 0

一句话总结

论文在 MS-COCO、Pascal VOC 和 Beetle 上对三种最先进的 SSOD 方法（MixPL、Semi-DETR、Consistent-Teacher）在每类别少样本监督下进行基准评测，分析准确率、延迟和模型大小以指导实际部署。

ABSTRACT

Learning in data-scarce settings has recently gained significant attention in the research community. Semi-supervised object detection(SSOD) aims to improve detection performance by leveraging a large number of unlabeled images alongside a limited number of labeled images(a.k.a.,few-shot learning). In this paper, we present a comprehensive comparison of three state-of-the-art SSOD approaches, including MixPL, Semi-DETR and Consistent-Teacher, with the goal of understanding how performance varies with the number of labeled images. We conduct experiments using the MS-COCO and Pascal VOC datasets, two popular object detection benchmarks which allow for standardized evaluation. In addition, we evaluate the SSOD approaches on a custom Beetle dataset which enables us to gain insights into their performance on specialized datasets with a smaller number of object categories. Our findings highlight the trade-offs between accuracy, model size, and latency, providing insights into which methods are best suited for low-data regimes.

研究动机与目标

在数据稀缺的工业场景中，为 SSOD 提供指导，且每类别标签有限。
在固定的每类别拍数（per-class shot sizes）下，比对三种具有公开实现的代表性 SSOD 方法。
评估在不同复杂度数据集上检测准确率、模型大小与推理延迟之间的权衡。
为实际部署提供注释策略与模型选择的实用建议。

提出的方法

在 ResNet-50 主干上评估三种 SSOD 方法——MixPL、Semi-DETR 与 Consistent-Teacher。
在 MS-COCO、Pascal VOC 和 Beetle 数据集上，使用 k-shot per class 采样，k ∈ {1,5,10,20,50,100,150}。
对每种方法和数据集，测量 mAP(0.50:0.95) 以及近似的推理时间和模型大小。
通过使用相同的数据切分和官方默认训练配置来标准化训练。
将基于 Transformer 的检测器（MixPL、Semi-DETR）与基于 CNN 的检测器（Consistent-Teacher）进行对比。
随着带标签数据的增加，分析性能趋势并描述部署权衡。

Figure 1: Performance comparisons with the number of $k$ -shots across MixPL, Semi-DETR, and Consistent-Teacher on MS-COCO, Pascal VOC and Beetle datasets.

实验结果

研究问题

RQ1RQ1：当每类别标注图像数从 1 变化到 150 时，哪种 SSOD 方法表现最好？
RQ2RQ2：低数据训练与整体检测性能之间存在哪些权衡？
RQ3RQ3：在评估的方法之间，性能、模型大小和延迟的权衡如何？

主要发现

MixPL 通常在所有 k-shot 情况下表现最强，Semi-DETR 紧随其后。
基于 Transformer 的方法（MixPL、Semi-DETR）在中高数据量情境下达到更高的峰值准确率，优于 Consistent-Teacher。
Consistent-Teacher 提供最低的推理延迟（约每图 9–15 ms）且模型尺寸最小，适合资源受限部署。
所有模型随着 k 增加均有性能提升，在数据极低情形下提升最显著，在更高的 shot 数下回报趋于递减。
推理时间在不同 k-shot 设置下保持相对稳定，表明延迟主要受架构影响而非训练数据量。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。