QUICK REVIEW

[论文解读] Light-Weight RetinaNet for Object Detection

Yixing Li, Fengbo Ren|arXiv (Cornell University)|May 24, 2019

Advanced Neural Network Applications参考文献 18被引用 27

一句话总结

本文提出一种轻量级RetinaNet，通过仅选择性地减少计算量最大的层——特别是顶部特征金字塔网络（FPN）分支——来实现，其余网络结构保持不变。与输入图像缩放方法相比，该方法在mAP-FLOPs权衡上表现更优，在FLOPs降低1.15倍时实现0.1%的mAP增益，在FLOPs降低1.8倍时实现0.3%的mAP增益，展现出线性退化趋势，优于传统缩放方法的指数级退化趋势。

ABSTRACT

Object detection has gained great progress driven by the development of deep learning. Compared with a widely studied task -- classification, generally speaking, object detection even need one or two orders of magnitude more FLOPs (floating point operations) in processing the inference task. To enable a practical application, it is essential to explore effective runtime and accuracy trade-off scheme. Recently, a growing number of studies are intended for object detection on resource constraint devices, such as YOLOv1, YOLOv2, SSD, MobileNetv2-SSDLite, whose accuracy on COCO test-dev detection results are yield to mAP around 22-25% (mAP-20-tier). On the contrary, very few studies discuss the computation and accuracy trade-off scheme for mAP-30-tier detection networks. In this paper, we illustrate the insights of why RetinaNet gives effective computation and accuracy trade-off for object detection and how to build a light-weight RetinaNet. We propose to only reduce FLOPs in computational intensive layers and keep other layer the same. Compared with most common way -- input image scaling for FLOPs-accuracy trade-off, the proposed solution shows a constantly better FLOPs-mAP trade-off line. Quantitatively, the proposed method result in 0.1% mAP improvement at 1.15x FLOPs reduction and 0.3% mAP improvement at 1.8x FLOPs reduction.

研究动机与目标

为解决高精度目标检测网络（如RetinaNet）计算成本过高的问题，这些网络所需的FLOPs远超分类网络。
探索mAP-30级检测网络更有效的FLOPs-精度权衡策略，这类网络通常部署在高端硬件上。
识别并仅优化RetinaNet中计算最密集的层，而非通过输入缩放或主干网络更改实现全局降低。
通过选择性地将检测头中的重层替换为轻量级架构，在保持高精度的同时降低推理成本。

提出的方法

该方法针对RetinaNet中最重的组件——顶部FPN分支（P3），其占总FLOPs的48%，通过轻量级模块变体进行替换。
轻量级模块设计用于在保持特征表示能力的同时降低FLOPs，其中D-block-v3被选为最优变体，因其在同等FLOPs降低水平下具有更好的精度保持能力。
该方法保留原始主干网络（ResNet-50）和特征金字塔结构，仅修改检测头的回归与分类分支。
该方法选择性地应用于FLOPs最密集的组件，避免因全局架构变更导致的精度下降。
训练计划按FLOPs降低比例相应延长，以维持模型收敛，遵循网络压缩研究中的原则。
该方法可推广至其他具有分块FLOPs不平衡的FPN-based检测网络，不限于RetinaNet。

实验结果

研究问题

RQ1是否仅在计算最密集的层中选择性降低FLOPs，能够获得优于传统输入图像缩放的mAP-FLOPs权衡？
RQ2为何RetinaNet能提供有效的FLOPs-精度权衡？这一特性如何用于构建更轻量的版本？
RQ3通过层特定优化降低FLOPs时，mAP的退化趋势是否比输入缩放带来的指数级退化更趋线性？
RQ4能否将轻量级模块设计有效迁移至检测任务中，而不会造成显著精度损失？
RQ5在仅优化最重检测头层的同时保持原始主干网络和特征金字塔结构，是否比全局降低策略更好地保持性能？

主要发现

与基线RetinaNet相比，所提方法在FLOPs降低1.15倍时实现0.1%的mAP提升。
在FLOPs降低1.8倍时，该方法相比基线实现0.3%的mAP提升，展现出更优的权衡性能。
所提方法的mAP退化曲线呈线性趋势，而输入图像缩放则呈现指数级退化，导致在低FLOPs水平下性能差距持续扩大。
在相同FLOPs降低水平下，D-block-v3轻量级模块优于基于MobileNet的D-block-v1，表明MobileNet模块并非检测头替换的最优选择。
在FLOPs-mAP权衡图中，红色曲线（所提方法）始终比蓝色曲线（输入缩放）更靠近左上角，证实其具有更优的权衡性能。
该方法可推广至其他具有FLOPs分布不均的FPN-based检测网络，为高效部署提供可扩展策略。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。