[Paper Review] Acquisition of Localization Confidence for Accurate Object Detection
IoU-Net predicts localization confidence (IoU with ground-truth) for each detected box, enabling IoU-guided NMS and an optimization-based bounding box refinement, improving localization accuracy and compatibility with existing detectors.
Modern CNN-based object detectors rely on bounding box regression and non-maximum suppression to localize objects. While the probabilities for class labels naturally reflect classification confidence, localization confidence is absent. This makes properly localized bounding boxes degenerate during iterative regression or even suppressed during NMS. In the paper we propose IoU-Net learning to predict the IoU between each detected bounding box and the matched ground-truth. The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes. Furthermore, an optimization-based bounding box refinement method is proposed, where the predicted IoU is formulated as the objective. Extensive experiments on the MS-COCO dataset show the effectiveness of IoU-Net, as well as its compatibility with and adaptivity to several state-of-the-art object detectors.
Motivation & Objective
- Motivate the lack of localization confidence in standard CNN-based detectors and its impact on NMS and bounding box refinement.
- Introduce IoU-Net to predict IoU between detected boxes and ground-truth.
- Develop IoU-guided NMS and an optimization-based bounding box refinement using predicted IoU.
- Demonstrate compatibility and improvements across state-of-the-art detectors on MS-COCO.
Proposed method
- Train an IoU predictor (IoU-Net) that estimates IoU(box_det, box_gt) using RoI features from an FPN backbone.
- Replace RoI Pooling with Precise RoI Pooling to enable differentiable, continuous pooling for IoU gradient calculation.
- Use IoU predictions to perform IoU-guided NMS, ranking by localization confidence and aggregating class scores within overlaps.
- Formulate an optimization-based bounding box refinement where the IoU predictor provides the objective, using gradient ascent with Precise RoI Pooling.
- Jointly train IoU-Net with existing detectors in an end-to-end fashion to improve overall AP.
Experimental results
Research questions
- RQ1Can learned localization confidence (IoU) improve bounding-box selection and reduce suppression of well-localized boxes in NMS?
- RQ2Does IoU-guided NMS outperform traditional NMS and Soft-NMS across detectors?
- RQ3Can optimization-based bounding box refinement driven by IoU predictions provide monotonic localization gains?
- RQ4Is IoU-Net compatible with and beneficial to existing detectors like FPN, Cascade R-CNN, and Mask R-CNN?
- RQ5Does joint training of IoU-Net with detectors yield measurable AP improvements?
Key findings
- IoU-guided NMS improves localization, especially at high IoU thresholds (e.g., AP at IoU 0.9 and higher) compared to traditional NMS and Soft-NMS.
- Optimization-based bounding box refinement guided by IoU predictions yields additional AP gains beyond regression-based methods, including improvements at high IoU levels.
- Jointly training IoU-Net with detectors provides modest AP gains (e.g., ≈0.4–0.6 percentage points in reported setups) and maintains compatible inference pipelines.
- Precise RoI Pooling enables differentiable, gradient-based refinement by allowing continuous, differentiable pooling over RoIs.
- IoU-Net adds little to inference speed while delivering measurable improvements across several backbones (e.g., ResNet-50/101 with FPN) and detectors (FPN, Cascade R-CNN, Mask R-CNN).
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.