[Paper Review] An Implementation of Faster RCNN with Study for Region Sampling
This paper implements Faster R-CNN in TensorFlow, analyzes simplifications, and studies region sampling strategies showing biased sampling toward small regions can match or exceed NMS-based sampling under sufficient convergence.
We adapted the join-training scheme of Faster RCNN framework from Caffe to TensorFlow as a baseline implementation for object detection. Our code is made publicly available. This report documents the simplifications made to the original pipeline, with justifications from ablation analysis on both PASCAL VOC 2007 and COCO 2014. We further investigated the role of non-maximal suppression (NMS) in selecting regions-of-interest (RoIs) for region classification, and found that a biased sampling toward small regions helps performance and can achieve on-par mAP to NMS-based sampling when converged sufficiently.
Motivation & Objective
- Adapt Faster RCNN join-training from Caffe to TensorFlow as a baseline implementation.
- Simplify the original pipeline and evaluate impact via ablation on VOC 2007 and COCO 2014.
- Investigate the role of region sampling and NMS in RoI selection for improved detection.
Proposed method
- Adopt crop_and_resize pooling instead of RoI pooling to produce 14x14 crops, max-pooled to 7x7 for fc6 input.
- Train with N=1 image and R=256 regions per forward-backward pass (gradient accumulation across batches avoided).
- Preserve default RPN training (R=256 regions) while training the region classifier with biased region sampling.
- Remove small-proposal exclusion (<16 pixels) during training due to observed performance loss for small objects.
- Compare multiple region-sampling schemes (NMS, ALL, PRE, POW, TOP) to study their effects on training/testing performance and recall.
Experimental results
Research questions
- RQ1Does switching from RoI pooling to crop_and_resize affect performance in Faster R-CNN?
- RQ2Is it advantageous to sample regions biased toward small proposals during training, instead of relying on NMS for de-duplication?
- RQ3How do different region sampling schemes (NMS, ALL, PRE, POW, TOP) impact mAP and recall on VOC 2007 and COCO 2014?
- RQ4What is the effect of increasing the number of sampled regions R on convergence and performance?
- RQ5Can testing with TOP (directly selecting top-K proposals) compensate for the absence of NMS during training?
Key findings
- Crop_and_resize pooling provides a slight performance advantage over RoI pooling in the TensorFlow Faster R-CNN implementation.
- Sampling R=256 regions from a single image (N=1) is effective and avoids slow gradient accumulation across multiple batches; RPN still uses 256 proposals.
- Removing the small-proposal filter (<16 px) during training improves performance, especially for small objects.
- Biased sampling toward small regions (NMS-based, PRE, POW, TOP schemes) generally yields similar or better mAP/recall than ALL sampling, with TOP often matching or exceeding NMS when K is large.
- For VOC 2007, biased sampling schemes achieve around 71% mAP, with small-object improvements (AP for small objects) noted; for COCO 2014, biased sampling improves AP and AR for small objects (e.g., AP and AR gains observed with specific configurations).
- Increasing R beyond 256 can lead to diminishing returns or overfitting, with 256 offering a good trade-off.
- With longer training (e.g., 790k iterations on COCO), the gap between NMS and biased sampling narrows, suggesting convergence time influences the relative performance of sampling strategies.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.