QUICK REVIEW

[論文レビュー] Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

Mohammad Javad Shafiee, Brendan Chywl|arXiv (Cornell University)|Sep 18, 2017

Advanced Neural Network Applications参考文献 13被引用数 70

ひとこと要約

Fast YOLO は進化的ネットワーク最適化と運動適応推論を用いて組み込みビデオの YOLOv2 を高速化し、約3.3倍の速度、約38% 減の深い推論（≈18 FPS on Jetson TX1）を達成し、パラメータを約2.8倍削減し、IOU は約2%低下する。

ABSTRACT

Object detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection performance compared to other approaches, with YOLOv2 (an improved You Only Look Once model) being one of the state-of-the-art in DNN-based object detection methods in terms of both speed and accuracy. Although YOLOv2 can achieve real-time performance on a powerful GPU, it still remains very challenging for leveraging this approach for real-time object detection in video on embedded computing devices with limited computational power and limited memory. In this paper, we propose a new framework called Fast YOLO, a fast You Only Look Once framework which accelerates YOLOv2 to be able to perform object detection in video on embedded devices in a real-time manner. First, we leverage the evolutionary deep intelligence framework to evolve the YOLOv2 network architecture and produce an optimized architecture (referred to as O-YOLOv2 here) that has 2.8X fewer parameters with just a ~2% IOU drop. To further reduce power consumption on embedded devices while maintaining performance, a motion-adaptive inference method is introduced into the proposed Fast YOLO framework to reduce the frequency of deep inference with O-YOLOv2 based on temporal motion characteristics. Experimental results show that the proposed Fast YOLO framework can reduce the number of deep inferences by an average of 38.13%, and an average speedup of ~3.3X for objection detection in video compared to the original YOLOv2, leading Fast YOLO to run an average of ~18FPS on a Nvidia Jetson TX1 embedded system.

研究の動機と目的

embedded デバイスでの検出性能を維持しつつ、YOLOv2 の計算量とメモリ負荷を削減する。
自動的にネットワークアーキテクチャを ~2.8x 小さくし、最小限の IOU 損失。
ビデオ処理における深い推論と電力消費を減らすためのモーション適応推論を導入する。

提案手法

最適化されたアーキテクチャ (O-YOLOv2) を合成するために進化的ディープインテリジェンスを使用し、 ~2.8x fewer parameters と ~2% IOU 低下を実現。
画像スタック (I_t, I_ref) を構築し、モーション確率マップを生成する 1x1 畳み込みを適用する。
フレーム上で深い推論を行うかどうかを決定するモーション適応推論モジュールを適用する。
深い推論が必要な場合は O-YOLOv2 を実行してクラス確率マップを更新し、I_ref と参照マップを更新する。そうでない場合は参照マップを再利用する。
最適化モデルを Pascal VOC 2007 で評価し、YOLOv2 と比較してパラメータ数と IOU を比較する。Nvidia Jetson TX1 でビデオの実行時間を評価して FPS と深い推論頻度を評価する。

実験結果

リサーチクエスチョン

RQ1進化的合成は組み込みデバイスに適したコンパクトで効果的な YOLOv2 ベースのネットワーク (O-YOLOv2) を生み出し得るか？
RQ2モーション適応推論はビデオストリームで検出性能を維持しつつ深い推論の回数と電力消費を減らすか？
RQ3埋め込みプラットフォームに Fast YOLO をデプロイした場合、YOLOv2 に比べて速度向上とリソース使用はどうなるか？
RQ4O-YOLOv2 は標準ベンチマークで YOLOv2 と比べてパラメータと IOU の点でどうか？

主な発見

ネットワークアーキテクチャ	パラメータ数	IOU
YOLOv2	48.2M	67.2%
O-YOLOv2	17.1M	65.10%

O-YOLOv2 は YOLOv2 よりパラメータが約2.8倍少なく、IOU の低下は約2%（67.2% 対 65.10%）にとどまる。
Fast YOLO は深い推論を約38.13% 減らし、Jetson TX1 で YOLOv2 より約3.3x のスピードアップを達成（≈18 FPS）。
Fast YOLO は YOLOv2 から 1 frame あたりの実行時間を 184 ms から 56 ms に平均改善。
Pascal VOC 2007 で O-YOLOv2 はパラメータが著しく少ないまま競争力のある検出性能を維持。
このフレームワークは最適化されたアーキテクチャとモーション対応推論を組み合わせ、電力消費を削減しリアルタイム組み込みビデオ検出を可能にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。