[Paper Review] Libra R-CNN: Towards Balanced Learning for Object Detection
Libra R-CNN introduces IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss to address training-time imbalances, improving COCO AP over baselines.
Compared with model architectures, the training process, which is also crucial to the success of detectors, has received relatively less attention in object detection. In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level. To mitigate the adverse effects caused thereby, we propose Libra R-CNN, a simple but effective framework towards balanced learning for object detection. It integrates three novel components: IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss, respectively for reducing the imbalance at sample, feature, and objective level. Benefitted from the overall balanced design, Libra R-CNN significantly improves the detection performance. Without bells and whistles, it achieves 2.5 points and 2.0 points higher Average Precision (AP) than FPN Faster R-CNN and RetinaNet respectively on MSCOCO.
Motivation & Objective
- Identify and quantify training-time imbalances in object detectors at sample, feature, and objective levels.
- Propose a balanced learning framework (IoU-balanced sampling, balanced feature pyramid, balanced L1 loss) to mitigate these imbalances.
- Demonstrate significant AP gains on MS COCO across two-stage and single-stage detectors with standard backbones.
- Show that the proposed components synergistically improve localization and recognition accuracy.
Proposed method
- IoU-balanced sampling to prioritize hard negatives/positives by IoU distribution without extra cost.
- Balanced feature pyramid that integrates multi-level features by equalizing information across resolutions.
- Balanced L1 loss that promotes crucial regression gradients and controls outlier influence during joint classification and localization tasks.
Experimental results
Research questions
- RQ1What training-time imbalances limit current object detectors across sample, feature, and objective levels?
- RQ2Can a deliberately balanced training framework improve both localization and recognition without complex architecture changes?
- RQ3Do IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss provide complementary gains when combined?
- RQ4How do these components affect performance on standard benchmarks like MS COCO across backbone choices?
Key findings
- Libra R-CNN achieves significant AP gains over baselines on COCO, e.g., 2.5 AP over FPN Faster R-CNN with ResNet-50 and 2.0 AP over RetinaNet.
- IoU-balanced sampling improves AP by up to ~0.9 points on val-2017 baseline.
- Balanced feature pyramid yields consistent gains across small/medium/large objects and complements PAFPN.
- Balanced L1 loss enhances localization, particularly AP75, by balancing gradients among inliers and outliers.
- With stronger backbones (e.g., ResNeXt-101-FPN), Libra R-CNN reaches 43.0 AP, outperforming several state-of-the-art detectors.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.