QUICK REVIEW

[论文解读] Precise Single-stage Detector

Aisha Chandio, Gong Gui|arXiv (Cornell University)|Oct 9, 2022

Advanced Image and Video Retrieval Techniques被引用 27

一句话总结

本文提出 PSSD，一种修改的 SSD，通过增加额外层、感受野扩展模块和双向 FPN 提升特征丰富性，并结合 IOU 指导的损失以在保持实时速度的同时提高精度。

ABSTRACT

There are still two problems in SDD causing some inaccurate results: (1) In the process of feature extraction, with the layer-by-layer acquisition of semantic information, local information is gradually lost, resulting into less representative feature maps; (2) During the Non-Maximum Suppression (NMS) algorithm due to inconsistency in classification and regression tasks, the classification confidence and predicted detection position cannot accurately indicate the position of the prediction boxes. Methods: In order to address these aforementioned issues, we propose a new architecture, a modified version of Single Shot Multibox Detector (SSD), named Precise Single Stage Detector (PSSD). Firstly, we improve the features by adding extra layers to SSD. Secondly, we construct a simple and effective feature enhancement module to expand the receptive field step by step for each layer and enhance its local and semantic information. Finally, we design a more efficient loss function to predict the IOU between the prediction boxes and ground truth boxes, and the threshold IOU guides classification training and attenuates the scores, which are used by the NMS algorithm. Main Results: Benefiting from the above optimization, the proposed model PSSD achieves exciting performance in real-time. Specifically, with the hardware of Titan Xp and the input size of 320 pix, PSSD achieves 33.8 mAP at 45 FPS speed on MS COCO benchmark and 81.28 mAP at 66 FPS speed on Pascal VOC 2007 outperforming state-of-the-art object detection models. Besides, the proposed model performs significantly well with larger input size. Under 512 pix, PSSD can obtain 37.2 mAP with 27 FPS on MS COCO and 82.82 mAP with 40 FPS on Pascal VOC 2007. The experiment results prove that the proposed model has a better trade-off between speed and accuracy.

研究动机与目标

解决 SSD 相关单阶段检测器在保留局部细节与对齐分类和框回归之间的不足。
在不对主干网络进行大幅修改的情况下丰富多尺度特征表示。
引入 IOU 指导的损失与预测机制，以改善 NMS 过滤及定位精度。

提出的方法

在 SSD 中增加额外层以扩展预测器使用的基础特征图。
引入一个特征增强模块（FEM），其包含感受野扩展模块（RFM）和一个双向 FPN，以在各尺度上丰富局部信息与语义信息。
重新设计主干网络，以在不增加大量参数开销的情况下改善均匀的感受野分布。
提出一个带有 IOU 分支的 IOU 指导预测结构，包括 R_IOU 损失与 CEJI 损失，以更好地对齐分类与定位，并在 NMS 过程中削弱高分低 IOU 的框。

实验结果

研究问题

RQ1在不依赖更深的骨干网络的前提下，如何实现 SSD 风格的单阶段检测器在速度-精度权衡上的改善？
RQ2IOU 指导的方法是否能够提升单阶段检测器中分类分数与定位精度的一致性？
RQ3双向特征金字塔与感受野扩展是否能在单阶段框架中改善对小物体和大物体的检测？

主要发现

方法	骨干网络	输入尺寸	FPS	AP	AP50	AP75	AP_small	AP_medium	AP_large
PSSD320	VGG16	320×320	45	33.8	52.2	35.8	14.8	38.5	50.3
PSSD512	VGG16	512×512	27	37.2	55.9	40.3	18.7	41.6	51.4

PSSD320 在 MS COCO 2017 test-dev 上使用 VGG16 主干网络、输入 320×320，达到 33.8 mAP，FPS 为 45。
PSSD512 在 MS COCO 2017 test-dev 上使用 VGG16 主干网络、输入 512×512，达到 37.2 mAP，FPS 为 27。
在 Pascal VOC 2007 上，PSSD320 达到 81.28 mAP，FPS 为 66，PSSD512 达到 82.82 mAP，FPS 为 40。
消融研究显示，双向 FPN 加上 RFM 与 IOU 指导的预测共同将 AP 从 25.8（SSD 基线）提升到 33.8（PSSD320）。
IOU 指导的预测与新损失项（R_IOU 损失和 CEJI 损失）相较基线在指标上有可测量的提升，并减少了高分但低 IOU 的预测。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。