QUICK REVIEW

[论文解读] Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

Mohammad Javad Shafiee, Brendan Chywl|arXiv (Cornell University)|Sep 18, 2017

Advanced Neural Network Applications参考文献 13被引用 70

一句话总结

Fast YOLO通过对嵌入式视频中的YOLOv2进行进化网络优化和基于运动的推理来加速YOLOv2，在嵌入式设备上实现约3.3x的速度提升和约38%的深度推断减少（在Jetson TX1上约为18 FPS），同时参数量减少约2.8x，IOU下降约2%。

ABSTRACT

Object detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection performance compared to other approaches, with YOLOv2 (an improved You Only Look Once model) being one of the state-of-the-art in DNN-based object detection methods in terms of both speed and accuracy. Although YOLOv2 can achieve real-time performance on a powerful GPU, it still remains very challenging for leveraging this approach for real-time object detection in video on embedded computing devices with limited computational power and limited memory. In this paper, we propose a new framework called Fast YOLO, a fast You Only Look Once framework which accelerates YOLOv2 to be able to perform object detection in video on embedded devices in a real-time manner. First, we leverage the evolutionary deep intelligence framework to evolve the YOLOv2 network architecture and produce an optimized architecture (referred to as O-YOLOv2 here) that has 2.8X fewer parameters with just a ~2% IOU drop. To further reduce power consumption on embedded devices while maintaining performance, a motion-adaptive inference method is introduced into the proposed Fast YOLO framework to reduce the frequency of deep inference with O-YOLOv2 based on temporal motion characteristics. Experimental results show that the proposed Fast YOLO framework can reduce the number of deep inferences by an average of 38.13%, and an average speedup of ~3.3X for objection detection in video compared to the original YOLOv2, leading Fast YOLO to run an average of ~18FPS on a Nvidia Jetson TX1 embedded system.

研究动机与目标

在保持检测性能的同时，降低YOLOv2在嵌入式设备上的计算和内存需求。
自动优化网络结构，使参数量约小2.8x，IOU损失最小。
引入运动自适应推理，以降低视频处理中的深度推断次数和功耗。

提出的方法

利用进化深度智能来合成优化架构（O-YOLOv2），参数量约少2.8x，IOU下降约2%。
构建图像栈(I_t, I_ref)，并应用1x1卷积生成运动概率圖。
应用运动自适应推理模块来决定是否对某帧执行深度推断。
若需要进行深度推断，则运行O-YOLOv2以更新类别概率图和更新I_ref及参考图；否则重用参考图。
在Pascal VOC 2007上评估优化后的模型以比较参数数量和IOU与YOLOv2的差异；在Nvidia Jetson TX1上评估视频运行时以评估FPS和深度推断频率。

实验结果

研究问题

RQ1进化合成是否能够产生紧凑而有效的基于YOLOv2的网络（O-YOLOv2），适用于嵌入式设备？
RQ2运动自适应推理是否在保持视频流检测性能的同时，减少深度推断次数和功耗？
RQ3在嵌入式平台上部署Fast YOLO相较于YOLOv2，得到的加速与资源占用是多少？
RQ4在标准基准测试中，O-YOLOv2在参数量和IOU方面与YOLOv2相比如何？

主要发现

网络架构	参数数量	IOU
YOLOv2	48.2M	67.2%
O-YOLOv2	17.1M	65.10%

O-YOLOv2在参数量方面约比YOLOv2小2.8x，IOU下降约2%（67.2% vs 65.10%）。
Fast YOLO平均将深度推断减少约38.13%，在Jetson TX1上实现约3.3x相较于YOLOv2的加速（≈18 FPS）。
Fast YOLO将平均运行时间从184 ms（YOLOv2）降低到每帧56 ms。
在Pascal VOC 2007上，O-YOLOv2保持具有竞争力的检测性能，同时参数显著减少。
该框架将优化后的架构与运动感知推理相结合，降低功耗并实现实时嵌入式视频检测。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。