QUICK REVIEW

[论文解读] Speed/accuracy trade-offs for modern convolutional object detectors

Jonathan Huang, Vivek Rathod|arXiv (Cornell University)|Nov 30, 2016

Advanced Neural Network Applications参考文献 42被引用 157

一句话总结

本文对 Faster R-CNN、R-FCN 和 SSD 三种元架构在多种特征提取网络、图像分辨率和候选框数量下进行了统一且可比的评估，以映射速度/精度/内存的权衡，并识别最优配置。

ABSTRACT

The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but apples-to-apples comparisons are difficult due to different base feature extractors (e.g., VGG, Residual Networks), different default image resolutions, as well as different hardware and software platforms. We present a unified implementation of the Faster R-CNN [Ren et al., 2015], R-FCN [Dai et al., 2016] and SSD [Liu et al., 2015] systems, which we view as "meta-architectures" and trace out the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures. On one extreme end of this spectrum where speed and memory are critical, we present a detector that achieves real time speeds and can be deployed on a mobile device. On the opposite end in which accuracy is critical, we present a detector that achieves state-of-the-art performance measured on the COCO detection task.

研究动机与目标

提供对现代卷积检测系统的简要综述，并展示它们在高级设计层面的相似性。
基于 TensorFlow 创建一个统一的 Faster R-CNN、R-FCN 和 SSD 的实现，以实现公平的速度/精度比较。
表征不同特征提取网络、候选区域数量和输入尺寸如何影响速度、内存和精度。
在速度/精度前沿识别甜点点并展示近乎最先进单模型的性能。
解释能够用于实际应用的实时或高精度检测器的配置。

提出的方法

在 TensorFlow 中实现针对三种元架构（Faster R-CNN、R-FCN、SSD）的统一、单次前向检测器。
评估六种特征提取网络的组合（VGG-16、ResNet-101、Inception v2、Inception v3、Inception-ResNet v2、MobileNet）。
改变输入图像尺寸（高：600，低：300），并且对 Faster R-CNN/R-FCN 调整区域提案数量（10–300）。
使用 Argmax 匹配，采用标准的 ground-truth 编码，以及 Smooth L1 本地化损失。
端到端训练，采用异步 SGD，冻结 batchnorm 参数，并在 COCO 上使用 COCO 指标（跨 IOU 阈值的 mAP）进行评估。
使用非极大值抑制进行后处理，并在 GPU 上以 1-image 批次报告时长/内存。

实验结果

研究问题

RQ1在 Faster R-CNN、R-FCN 和 SSD 三种元架构之间，速度、内存和精度如何权衡？
RQ2在每种元架构中，不同的特征提取网络如何影响检测性能与效率？
RQ3输入分辨率和提案数量对速度和 mAP 有何影响？
RQ4是否存在在速度/精度前沿的可辨识甜点点，能够在实时性能与精度之间取得平衡？
RQ5单模型检测器在不使用集成或多裁剪方法的情况下，能否接近最先进的精度？

主要发现

模型	Minival mAP	Test-dev mAP
(Fastest) SSD w/MobileNet (Low Resolution)	19.3	18.8
(Fastest) SSD w/Inception V2 (Low Resolution)	22	21.6
(Sweet Spot) Faster R-CNN w/Resnet 101, 100 Proposals	32	31.9
(Sweet Spot) R-FCN w/Resnet 101, 300 Proposals	30.4	30.3
(Most Accurate) Faster R-CNN w/Inception Resnet V2, 300 Proposals	35.7	35.6

Faster R-CNN 往往更慢但更准确，除非将提案数量限制以降低运行时。
R-FCN 和 SSD 在大多数配置下通常提供更快的推理速度，且精度具有竞争力。
研究识别出甜点点，如 Faster R-CNN 与 ResNet-101，50–100 提案数量，或 R-FCN 与 ResNet-101，300 提案数量，作为强有力的速度/精度候选。
报告中最准确的单模型配置是 Faster R-CNN 与 Inception-ResNet-v2、300 提案，尽管它是最慢的选项。
使用 MobileNet 或 Inception V2 的 SSD 配置在所评估的设置中提供最快的结果，在低分辨率下具有显著的精度优势。
提高输入分辨率会提升 mAP，但显著增加运行时间，突显了精度与速度之间的权衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。