[论文解读] DSOD: Learning Deeply Supervised Object Detectors from Scratch
DSOD 从零开始在一个无需提议、密集连接的框架中训练目标检测器,灵感来自DenseNets和SSD,在更小模型和实时速度下实现最先进的结果。
We present Deeply Supervised Object Detector (DSOD), a framework that can learn object detectors from scratch. State-of-the-art object objectors rely heavily on the off-the-shelf networks pre-trained on large-scale classification datasets like ImageNet, which incurs learning bias due to the difference on both the loss functions and the category distributions between classification and detection tasks. Model fine-tuning for the detection task could alleviate this bias to some extent but not fundamentally. Besides, transferring pre-trained models from classification to detection between discrepant domains is even more difficult (e.g. RGB to depth images). A better solution to tackle these two critical problems is to train object detectors from scratch, which motivates our proposed DSOD. Previous efforts in this direction mostly failed due to much more complicated loss functions and limited training data in object detection. In DSOD, we contribute a set of design principles for training object detectors from scratch. One of the key findings is that deep supervision, enabled by dense layer-wise connections, plays a critical role in learning a good detector. Combining with several other principles, we develop DSOD following the single-shot detection (SSD) framework. Experiments on PASCAL VOC 2007, 2012 and MS COCO datasets demonstrate that DSOD can achieve better results than the state-of-the-art solutions with much more compact models. For instance, DSOD outperforms SSD on all three benchmarks with real-time detection speed, while requires only 1/2 parameters to SSD and 1/10 parameters to Faster RCNN. Our code and models are available at: https://github.com/szq0214/DSOD .
研究动机与目标
- 推动从零开始训练目标检测器,避免对预训练的分类模型带来的偏见。
- 提出面向资源高效、高精度检测器的设计原则。
- 开发一个基于无提议、单阶段检测范式并带有深度监督的 DSOD 框架。
- 证明 DSOD 在 VOC 2007、VOC 2012 和 MS COCO 上以更小的模型达到最先进的结果。
提出的方法
- 采用基于SSD的无提议、单Shot检测框架以提升速度。
- 通过密集逐层连接引入深度监督,从而实现隐含的辅助监督。
- 引入一个 stem 块以减少来自原始输入的信息损失。
- 使用密集预测结构,融合每个预测尺度的多尺度特征图。
- 包含一个不进行下采样的 Transition 层,以在不下采样的情况下增加密集块。
- 在标准检测基准上从零开始训练所有网络。
实验结果
研究问题
- RQ1目标检测器能否在没有预训练分类模型的情况下从零开始有效训练?
- RQ2哪些网络设计原则能在从零训练的检测器中实现高精度与高效?
- RQ3密集的、多尺度预测结构如何影响从零训练检测器的精度和参数效率?
主要发现
- DSOD 在 VOC 2007、VOC 2012 和 MS COCO 上在没有 ImageNet 预训练的情况下实现有竞争力且有时更优的 mAP。
- DSOD300 与简单连接在 07+12 训练时在 VOC 2007 测试达到 77.3% mAP;使用密集预测提高到 77.7%。
- 在 COCO 数据(07+12+COCO)下,DSOD300 使用密集预测在 VOC 2007 测试达到 81.7% mAP。
- DSOD 提供实时检测速度(例如 Titan X 的 300x300 下 20.6 fps,使用简单结构)并比 SSD/Faster R-CNN 基线使用的参数少得多。
- stem 块和不带 pooling 的 Transition 层显著提升精度,而密集预测结构减少参数并且可以提升精度。
- 从零开始训练的 DSOD 可以媲美或超过从预训练分类器微调的模型,突出无预训练的检测架构设计的价值。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。