QUICK REVIEW

[论文解读] ThunderNet: Towards Real-time Generic Object Detection

Zheng Qin, Zeming Li|arXiv (Cornell University)|Mar 28, 2019

Advanced Neural Network Applications参考文献 31被引用 43

一句话总结

ThunderNet 提供一个轻量级的两阶段检测器，旨在在移动设备上实现实时泛对象检测，采用自定义轻量级骨干网（SNet）和高效检测头，包括上下文增强模块和空间注意力模块，在低 FLOPs 下实现 ARM 实时速度并具有竞争力的精度。

ABSTRACT

Real-time generic object detection on mobile platforms is a crucial but challenging computer vision task. However, previous CNN-based detectors suffer from enormous computational cost, which hinders them from real-time inference in computation-constrained scenarios. In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. In the backbone part, we analyze the drawbacks in previous lightweight backbones and present a lightweight backbone designed for object detection. In the detection part, we exploit an extremely efficient RPN and detection head design. To generate more discriminative feature representation, we design two efficient architecture blocks, Context Enhancement Module and Spatial Attention Module. At last, we investigate the balance between the input resolution, the backbone, and the detection head. Compared with lightweight one-stage detectors, ThunderNet achieves superior performance with only 40% of the computational cost on PASCAL VOC and COCO benchmarks. Without bells and whistles, our model runs at 24.1 fps on an ARM-based device. To the best of our knowledge, this is the first real-time detector reported on ARM platforms. Our code and models are available at \url{https://github.com/qinzheng93/ThunderNet}.

研究动机与目标

调查两阶段检测器是否能够在移动设备上实现实时性能。
设计专为目标检测而非从图像分类迁移而来的轻量级骨干网。
开发高效的检测头组件，在精度和计算成本之间取得平衡。
在输入分辨率、骨干网容量和检测头设计之间架桥，以实现最佳实时性能。

提出的方法

通过在 ShuffleNetV2 上使用 5×5 深度卷积来改造，以扩大感受野，提出 SNet 轻量级骨干网。
压缩 RPN 和 RoI 头部组件以在减少计算量的同时保持精度（例如，在 RPN 使用 5×5 深度卷积、1×1 卷积，以及减小 R-CNN 全连接层大小）。
引入上下文增强模块（CEM），通过 1×1 投影以及上采样/广播，将多尺度局部与全局上下文（C4、C5、Cglb）融合。
引入空间注意力模块（SAM），通过 1×1 变换使用 RPN 推导的前景信号对 CEM 特征进行再加权。
探索输入分辨率、骨干网和检测头之间的平衡，以在移动硬件上最大化速度和精度。
端到端训练，使用同步 SGD、多尺度训练、跨 GPU 批归一化和 Soft-NMS。

实验结果

研究问题

RQ1两阶段检测器是否能够在移动硬件上在速度和精度方面超越轻量级的一阶段检测器？
RQ2哪些骨干网和检测头设计选择能在实时移动检测中实现最佳的精度-效率权衡？
RQ3上下文注意力和空间注意力机制如何影响特征表示与定位？
RQ4在 ARM 平台上，输入分辨率、骨干网容量和检测头复杂度之间的最佳平衡是什么？

主要发现

模型	骨干网	输入	MFLOPs	AP	AP50	AP75
ThunderNet (ours)	SNet49	320×320	262	19.2	33.7	19.7
ThunderNet (ours)	SNet146	320×320	473	23.7	40.3	24.6
ThunderNet (ours)	SNet535	320×320	1300	28.1	46.2	29.6

搭载 SNet49 的 ThunderNet 在约 22% 的 FLOPs 下达到 MobileNet-SSD 级别的精度。
搭载 SNet146 的 ThunderNet 在约 40% FLOPs 下超越以往的轻量检测器。
搭载 SNet535 的 ThunderNet 在只有很小一部分 FLOPs 的情况下接近大型检测器的性能。
在 COCO test-dev 上，使用 SNet146 的 ThunderNet 达到 AP 23.7，AP50 40.3，AP75 24.6；使用 SNet535 时达到 AP 28.1，AP50 46.2，AP75 29.6。
ThunderNet 在 ARM 上分别以 24.1 fps（SNet49）和 13.8 fps（SNet146）运行，所有变体在 GPU 上超过 200 fps。
在类似 FLOPs 下，大骨干-小检测头设计优于小骨干-大检测头，凸显骨干与检测头的兼容性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。