[Paper Review] A Simple Semi-Supervised Learning Framework for Object Detection
STAC introduces a simple two-stage semi-supervised framework that uses a teacher to generate high-confidence pseudo bounding boxes from unlabeled data and trains with strong augmentations to improve object detectors, achieving significant data-efficiency gains on MS-COCO and VOC07.
Semi-supervised learning (SSL) has a potential to improve the predictive performance of machine learning models using unlabeled data. Although there has been remarkable recent progress, the scope of demonstration in SSL has mainly been on image classification tasks. In this paper, we propose STAC, a simple yet effective SSL framework for visual object detection along with a data augmentation strategy. STAC deploys highly confident pseudo labels of localized objects from an unlabeled image and updates the model by enforcing consistency via strong augmentations. We propose experimental protocols to evaluate the performance of semi-supervised object detection using MS-COCO and show the efficacy of STAC on both MS-COCO and VOC07. On VOC07, STAC improves the AP$^{0.5}$ from $76.30$ to $79.08$; on MS-COCO, STAC demonstrates $2{ imes}$ higher data efficiency by achieving 24.38 mAP using only 5\% labeled data than supervised baseline that marks 23.86\% using 10\% labeled data. The code is available at https://github.com/google-research/ssl_detection/.
Motivation & Objective
- Motivate label-efficient object detection due to high labeling cost.
- Develop a simple SSL framework that leverages pseudo labeling and augmentation consistency.
- Show STAC’s effectiveness on MS-COCO and PASCAL VOC across low-label regimes.
Proposed method
- Two-stage training inspired by Noisy-Student: train a teacher on labeled data, then generate pseudo boxes for unlabeled images.
- Use a high confidence threshold to filter pseudo boxes and perform test-time inference to obtain pseudo labels.
- Apply strong, diverse data augmentations (global color, global geometric, box-level transforms, and Cutout) to unlabeled data and adjust pseudo boxes accordingly.
- Compute unsupervised loss with respect to pseudo labels after strong augmentation, combined with supervised loss on labeled data.
- Optimize Faster R-CNN with a simple unsupervised loss weight lambda_u and a confidence threshold tau (tau ≈ 0.9, lambda_u ≈ 2).
- Evaluate on MS-COCO and PASCAL VOC using 1–10% labeled data and full data, comparing to supervised baselines.
Experimental results
Research questions
- RQ1Can a simple SSL framework using pseudo labeling and strong augmentations improve object detection with limited labeled data?
- RQ2How do pseudo-label quality, augmentation strength, and unlabeled data scale affect detection performance on MS-COCO and VOC07?
- RQ3What are effective hyperparameters (tau, lambda_u) for STAC in the low-label regime?
Key findings
- STAC consistently improves over supervised baselines across 1–10% labeled data on MS-COCO.
- With 5% labeled data, STAC improves mAP from 18.47 (supervised) to 24.38; with 10% labeled data, from 23.86 to 28.64.
- STAC achieves 39.21 mAP on 100% COCO (vs 37.63 supervised baseline and 39.48 with strong augmentation).
- On VOC07, STAC with additional unlabeled data reaches 46.01 mAP (AP50 79.08) vs 42.60/76.30 for supervised baselines.
- STAC data efficiency is approximately 2x in the low-label regime, particularly at 5% and 10% labeled data, and benefits from larger unlabeled data pools.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.