[论文解读] Cascade R-CNN: High Quality Object Detection and Instance Segmentation
Cascade R-CNN 引入了一个多阶段检测器,每个阶段在更高的 IoU 阈值下进行训练,逐步提升边界框质量并使推理匹配更高质量的假设;扩展为实例分割,形成 Cascade Mask R-CNN。
In object detection, the intersection over union (IoU) threshold is frequently used to define positives/negatives. The threshold used to train a detector defines its extit{quality}. While the commonly used threshold of 0.5 leads to noisy (low-quality) detections, detection performance frequently degrades for larger thresholds. This paradox of high-quality detection has two causes: 1) overfitting, due to vanishing positive samples for large thresholds, and 2) inference-time quality mismatch between detector and test hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, composed of a sequence of detectors trained with increasing IoU thresholds, is proposed to address these problems. The detectors are trained sequentially, using the output of a detector as training set for the next. This resampling progressively improves hypotheses quality, guaranteeing a positive training set of equivalent size for all detectors and minimizing overfitting. The same cascade is applied at inference, to eliminate quality mismatches between hypotheses and detectors. An implementation of the Cascade R-CNN without bells or whistles achieves state-of-the-art performance on the COCO dataset, and significantly improves high-quality detection on generic and specific object detection datasets, including VOC, KITTI, CityPerson, and WiderFace. Finally, the Cascade R-CNN is generalized to instance segmentation, with nontrivial improvements over the Mask R-CNN. To facilitate future research, two implementations are made available at \url{https://github.com/zhaoweicai/cascade-rcnn} (Caffe) and \url{https://github.com/zhaoweicai/Detectron-Cascade-RCNN} (Detectron).
研究动机与目标
- 激发对通过更高 IoU 阈值(u)定义的高质量目标检测的需求。
- 提出一种级联系统架构,使检测器的质量与逐步更高质量的假设相匹配。
- 通过在不同 IoU 水平对训练数据进行重采样来解决过拟合和推理时的质量不匹配问题。
- 证明级联在跨数据集的定位方面提升并减少近距离假阳性。
提出的方法
- 介绍 Cascade R-CNN,这是对 Faster R-CNN 的多阶段扩展,采用在递增的 IoU 阈值下训练的级联边界框回归器和分类器。
- 将级联回归作为重采样机制,在每个阶段生成更高 IoU 的假设,同时保持训练样本量大致不变。
- 在推理阶段应用相同的级联,以逐步细化假设并使检测器的强度与假设质量一致。
- 提供边界框回归目标的均值/方差归一化,以实现稳定的多任务学习。
- 通过整合分割分支将级联扩展到实例分割,得到 Cascade Mask R-CNN。
实验结果
研究问题
- RQ1在每个阶段维持足够的正样本的前提下,在递增 IoU 阈值下训练的级联检测器能否克服高质量检测的悖论?
- RQ2级联边界框回归和分类是否在不产生过拟合的情况下提高高 IoU 的检测?
- RQ3级联系统在多样化数据集上是否有益,并且与现有的检测/分割增强方法兼容?
主要发现
- 一个简单的 Cascade R-CNN 实现就能在 COCO 数据集上达到最先进的性能,而无需大量华而不实的改进。
- 级联在各种基线下带来约 2–4 点的准确率提升,计算开销适中(章节注释);在更严格的定位指标下提升更大。
- 级联边界框回归逐步提高 IoU 质量,而级联检测在每一个阶段维持稳定的正样本集,缓解高 IoU 阈值下的过拟合。
- 在推理阶段,应用级联会得到逐步更高质量的假设,更好地匹配更高质量的检测器。
- 将级联系统扩展到实例分割(Cascade Mask R-CNN)在多个数据集上相对于 Mask R-CNN 带来非平凡的改进。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。