Skip to main content
QUICK REVIEW

[论文解读] RON: Reverse Connection with Objectness Prior Networks for Object Detection

Tao Kong, Fuchun Sun|arXiv (Cornell University)|Jul 6, 2017
Advanced Neural Network Applications参考文献 25被引用 66
一句话总结

RON 将跨 CNN 不同尺度的反向连接与对象性先验相结合,打造一个快速、端到端、全卷积的检测器,能够与基于区域和非区域方法竞争,在 VOC 与 COCO 上取得出色的结果,同时运行速率约为 15 FPS。

ABSTRACT

We present RON, an efficient and effective framework for generic object detection. Our motivation is to smartly associate the best of the region-based (e.g., Faster R-CNN) and region-free (e.g., SSD) methodologies. Under fully convolutional architecture, RON mainly focuses on two fundamental problems: (a) multi-scale object localization and (b) negative sample mining. To address (a), we design the reverse connection, which enables the network to detect objects on multi-levels of CNNs. To deal with (b), we propose the objectness prior to significantly reduce the searching space of objects. We optimize the reverse connection, objectness prior and object detector jointly by a multi-task loss function, thus RON can directly predict final detection results from all locations of various feature maps. Extensive experiments on the challenging PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO benchmarks demonstrate the competitive performance of RON. Specifically, with VGG-16 and low resolution 384X384 input size, the network gets 81.3% mAP on PASCAL VOC 2007, 80.7% mAP on PASCAL VOC 2012 datasets. Its superiority increases when datasets become larger and more difficult, as demonstrated by the results on the MS COCO dataset. With 1.5G GPU memory at test phase, the speed of the network is 15 FPS, 3X faster than the Faster R-CNN counterpart.

研究动机与目标

  • Bridge region-based and region-free detection paradigms to leverage their strengths.
  • Enable multi-scale object localization by associating objects with corresponding CNN scales through reverse connections.
  • Reduce negative sample searching via an objectness prior to guide detection.
  • Train and deploy a unified, end-to-end framework that jointly optimizes objectness, localization, and classification.

提出的方法

  • Introduce a reverse connection that fuses features from higher-level semantic maps into lower-level layers to detect objects at multiple CNN scales.
  • Generate reference (default) boxes on multiple feature maps with scales and aspect ratios to cover object sizes.
  • Add an objectness prior as a lightweight branch to guide the search for objects, reducing negative samples during training and inference.
  • Use an inception-based detection module to classify and regress bounding boxes on multi-scale feature maps.
  • Combine objectness prior with detection through a multi-task loss that jointly optimizes objectness, localization, and per-class classification.
  • At inference, compute class-conditional scores by multiplying objectness with class-conditioned predictions and apply NMS to obtain final detections.

实验结果

研究问题

  • RQ1How can multi-scale object localization be improved by distributing detection across multiple CNN scales with learnable reverse connections?
  • RQ2Can an explicit objectness prior reduce the searching space and improve training efficiency without generating separate region proposals?
  • RQ3Does joint, end-to-end optimization of objectness, localization, and classification yield competitive performance against region-based and region-free detectors?

主要发现

  • RON achieves 81.3% mAP on PASCAL VOC 2007 with MS COCO pretraining and VOC2012 fine-tuning (VOC2007 results in Table 4).
  • RON achieves 80.7% mAP on PASCAL VOC 2012 with the same pretraining setup (Table 4).
  • On MS COCO test-dev2015, RON reaches 27.4% AP, outperforming Faster R-CNN and SSD under standard COCO evaluation (Table 3).
  • With 1.5 GB GPU memory at test time, RON runs at 15 FPS, about 3x faster than Faster R-CNN.
  • Using multiple feature maps and reverse connections improves small-object detection (e.g., boat and bottle) compared to baselines.
  • COCO-pretrained fine-tuning substantially boosts VOC results, with RON384++ achieving top performance among VGG-16 based models on VOC2012.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。