QUICK REVIEW

[论文解读] MobileDets: Searching for Object Detection Architectures for Mobile Accelerators

Yunyang Xiong, Hanxiao Liu|arXiv (Cornell University)|Apr 30, 2020

Advanced Neural Network Applications参考文献 44被引用 41

一句话总结

MobileDets 引入一个平台感知的 NAS 搜索空间，包含普通卷积与反向瓶颈，并在 CPU、EdgeTPU、DSP 和边缘 GPU 上达到最先进的延迟-精度权衡，用于移动目标检测。

ABSTRACT

Inverted bottleneck layers, which are built upon depthwise convolutions, have been the predominant building blocks in state-of-the-art object detection models on mobile devices. In this work, we investigate the optimality of this design pattern over a broad range of mobile accelerators by revisiting the usefulness of regular convolutions. We discover that regular convolutions are a potent component to boost the latency-accuracy trade-off for object detection on accelerators, provided that they are placed strategically in the network via neural architecture search. By incorporating regular convolutions in the search space and directly optimizing the network architectures for object detection, we obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators. On the COCO object detection task, MobileDets outperform MobileNetV3+SSDLite by 1.7 mAP at comparable mobile CPU inference latencies. MobileDets also outperform MobileNetV2+SSDLite by 1.9 mAP on mobile CPUs, 3.7 mAP on Google EdgeTPU, 3.4 mAP on Qualcomm Hexagon DSP and 2.7 mAP on Nvidia Jetson GPU without increasing latency. Moreover, MobileDets are comparable with the state-of-the-art MnasFPN on mobile CPUs even without using the feature pyramid, and achieve better mAP scores on both EdgeTPUs and DSPs with up to 2x speedup. Code and models are available in the TensorFlow Object Detection API: https://github.com/tensorflow/models/tree/master/research/object_detection.

研究动机与目标

在现代加速器上，推动重新评估移动检测器的构建模块，超越 inverted bottlenecks (IBNs)。
提出一个扩展的搜索空间（MobileDet），其中包含常规卷积和基于 Tucker 的块，以提升延迟-精度。
证明直接在目标检测任务上进行架构搜索比仅在骨干网 NAS 上对移动硬件得到更好的结果。
展示 MobileDets 在多种硬件平台上以低延迟获得最先进或有竞争力的 mAP。
提供可直接发行的代码和模型，集成于 TensorFlow Object Detection API，以便更广泛的采用。

提出的方法

引入 MobileDet 搜索空间，在 IBN 的基础上加入常规卷积（融合反向瓶颈和 Tucker/卷积块）。
描述两个灵活的构建块：(i) 融合的反向瓶颈（用常规 KxK 卷积替代深度卷积+逐点卷积），以及 (ii) Tucker 卷积（通过 1x1、KxK、1x1 块实现压缩）。
将这些块嵌入一个延迟感知的神经架构搜索（NAS）框架（TuNAS），其奖励结合了 mAP 和延迟的平台感知。
训练一个成本模型 c(·) 以从层决策预测硬件延迟，从而在不对每个候选项进行 on-device 基准测试的情况下实现快速 NAS。
在 COCO 上以检测特定目标（SSDLite head）进行搜索，并通过在每个目标硬件上从头重新训练来评估最终架构。
使用 TF-Lite、EdgeTPU、DSP 和 GPU 后端报告延迟基准。

实验结果

研究问题

RQ1是否可以通过 NAS 策略性放置常规卷积来改善跨多种硬件的移动目标检测的延迟-精度权衡？
RQ2将搜索空间扩展到 IBN 之外（包括融合的常规卷积和 Tucker 块）是否在 CPU、EdgeTPU、DSP 和 GPU 上带来可衡量的提升？
RQ3为一个硬件平台发现的架构是否可迁移到其他平台，程度如何？
RQ4面向检测的 NAS（与仅骨干网络 NAS 相对）在 COCO 上跨多种边缘设备的表现如何？
RQ5提出的 MobileDet 空间能否泛化到未见过的硬件（如 NVIDIA Jetson GPU），并保持收益？

主要发现

与仅依赖 IBN 的搜索空间的基线相比，MobileDets 在 CPU、EdgeTPU、DSP 和边缘 GPU 上持续改善延迟-精度权衡。
在 COCO 上，MobileDets 在相当的 CPU 延迟下比 MobileNetV2+SSDLite 高出 1.7 mAP；在移动 CPU 上比 MobileNetV2 高出 1.9 mAP，在 EdgeTPU 上高出 3.7 mAP，在 DSP 上高出 3.4 mAP，在边缘 GPU 上高出 2.7 mAP，且未增加延迟。
MobileDets 在移动 CPU 上与 MnasFPN 的性能不相上下甚至超越，在 EdgeTPU 和 DSP 上实现更高的 mAP，且速度提升高达 2x，即使没有 NAS-FPN 头也如此。
在搜索空间中包含常规卷积在非 CPU 加速器（EdgeTPU、DSP）上带来显著提升，因为深度卷积的优化程度较低。
为 EdgeTPU/DSP 发现的架构在未见硬件上（如 NVIDIA Jetson Xavier GPU）良好迁移，展示了 MobileDet 空间的通用性。
包含 Tucker 压缩和融合块（IBN+Fused+Tucker）的搜索空间在非 CPU 硬件上相对于仅 IBN 或更小的空间提供了额外的性能提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。