QUICK REVIEW

[论文解读] DetNet: A Backbone network for Object Detection

Zeming Li, Chao Peng|arXiv (Cornell University)|Apr 17, 2018

Advanced Neural Network Applications参考文献 4被引用 244

一句话总结

DetNet 引入了专门用于目标检测的骨干网络，在更深层保持高分辨率，使用扩张瓶颈块，从而在 FLOPs 较对比方法低的情况下获得 COCO 的最先进结果。

ABSTRACT

Recent CNN based object detectors, no matter one-stage methods like YOLO, SSD, and RetinaNe or two-stage detectors like Faster R-CNN, R-FCN and FPN are usually trying to directly finetune from ImageNet pre-trained models designed for image classification. There has been little work discussing on the backbone feature extractor specifically designed for the object detection. More importantly, there are several differences between the tasks of image classification and object detection. 1. Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales. 2. Object detection not only needs to recognize the category of the object instances but also spatially locate the position. Large downsampling factor brings large valid receptive field, which is good for image classification but compromises the object location ability. Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection. Moreover, DetNet includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers. Without any bells and whistles, state-of-the-art results have been obtained for both object detection and instance segmentation on the MSCOCO benchmark based on our DetNet~(4.8G FLOPs) backbone. The code will be released for the reproduction.

研究动机与目标

识别用于检测任务的 ImageNet 分类骨干的局限性。
设计一个在不牺牲感受野的前提下保持空间分辨率的骨干。
验证 DetNet 在与像 FPN 这样的检测头结合用于 COCO 目标检测与实例分割时的有效性。

提出的方法

在骨干中引入额外阶段（如 P6），同时固定下采样，使阶段4后保持 16 倍下采样。
使用带有 1x1 投影的扩张瓶颈块来创建新阶段，而不增加空间分辨率成本。
维持与检测器（如 FPN）相同的阶段数量，确保 ImageNet 预训练仍然兼容。
在 FPN 框架内对比 DetNet-59（基于 ResNet-50）与 COCO 上的 ResNet 骨干。
比较从 ImageNet 预训练得到的 DetNet 变体与从零开始训练的变体，以隔离骨干影响。

实验结果

研究问题

RQ1一个针对检测优化、保留高分辨率深层特征的骨干，是否能提升在 COCO 上的定位精度以及对小/大目标的检测？
RQ2DetNet-59 是否能以更低或可比的 FLOPs 超越 ResNet-50/FPN 基线，同时实现更好的检测和实例分割结果？

主要发现

模型	骨干	mAP	AP 50	AP 75	AP s	AP m	AP l
SSD513	ResNet-101	31.2	50.4	33.3	10.2	34.5	49.8
DSSD513	ResNet-101	33.2	53.3	35.2	13.0	35.4	51.1
Faster R-CNN +++	ResNet-101	34.9	55.7	37.4	15.6	38.7	50.9
Faster R-CNN G-RMI 2 2 2	Inception-ResNet-v2	34.7	55.5	36.7	13.5	38.1	52.0
RetinaNet	ResNet-101	39.1	59.1	42.3	21.8	42.7	50.2
FPN	ResNet-101	37.3	59.6	40.3	19.8	40.2	48.8
FPN	DetNet-59	40.3	62.1	43.8	23.6	42.6	50.0

DetNet-59 结合 FPN 时在 mAP 与 AP 指标上超过 ResNet-50（例如，在消融实验中 mAP 从 37.9 提升到 40.2）。
DetNet-59 配 FPN 在 COCO 检测中超越基于 ResNet-101 的骨干，尽管 FLOPs 较少（4.8G 对 7.6G）。
DetNet-59 从零开始训练仍然超过 ResNet-50 从零开始在 COCO FPN 的结果（36.3 vs 34.5 mAP）。
DetNet-59 对大目标在 APl 和 IoU=85 时的 AR 提升显著，表明定位提升。
基于 DetNet-59 的 Mask R-CNN 在 COCO test-dev 上达到最先进的实例分割结果，相比若干 ResNet-101 基线。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。