QUICK REVIEW

[논문 리뷰] Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

Mohammad Javad Shafiee, Brendan Chywl|arXiv (Cornell University)|2017. 09. 18.

Advanced Neural Network Applications참고 문헌 13인용 수 70

한 줄 요약

Fast YOLO는 진화적 네트워크 최적화와 모션-적응 추론을 사용하여 임베디드 비디오용 YOLOv2 속도를 높이고, 약 3.3배의 속도와 약 38%의 깊은 추론 감소를 달성하며 (Jetson TX1에서 ≈18 FPS), 매개변수는 약 2.8배 감소하고 IOU는 약 2% 감소합니다.

ABSTRACT

Object detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection performance compared to other approaches, with YOLOv2 (an improved You Only Look Once model) being one of the state-of-the-art in DNN-based object detection methods in terms of both speed and accuracy. Although YOLOv2 can achieve real-time performance on a powerful GPU, it still remains very challenging for leveraging this approach for real-time object detection in video on embedded computing devices with limited computational power and limited memory. In this paper, we propose a new framework called Fast YOLO, a fast You Only Look Once framework which accelerates YOLOv2 to be able to perform object detection in video on embedded devices in a real-time manner. First, we leverage the evolutionary deep intelligence framework to evolve the YOLOv2 network architecture and produce an optimized architecture (referred to as O-YOLOv2 here) that has 2.8X fewer parameters with just a ~2% IOU drop. To further reduce power consumption on embedded devices while maintaining performance, a motion-adaptive inference method is introduced into the proposed Fast YOLO framework to reduce the frequency of deep inference with O-YOLOv2 based on temporal motion characteristics. Experimental results show that the proposed Fast YOLO framework can reduce the number of deep inferences by an average of 38.13%, and an average speedup of ~3.3X for objection detection in video compared to the original YOLOv2, leading Fast YOLO to run an average of ~18FPS on a Nvidia Jetson TX1 embedded system.

연구 동기 및 목표

감지 성능을 유지하면서 임베디드 디바이스에서 YOLOv2의 계산 및 메모리 요구를 감소시키는 것.
IOU 손실을 최소화하면서 네트워크 아키텍처를 자동으로 ~2.8배 더 작게 최적화하는 것.
비디오 처리에서 깊은 추론을 감소시키고 전력 소비를 줄이기 위해 모션-적응 추론을 도입하는 것.

제안 방법

진화적 딥 인텔리전스를 사용하여 ~2.8배 적은 매개변수와 ~2% IOU 감소를 가진 최적화된 아키텍처(O-YOLOv2)를 합성한다.
이미지 스택(I_t, I_ref)을 구성하고 1x1 컨볼루션을 적용하여 모션 확률 맵을 생성한다.
프레임에 대해 깊은 추론을 수행할지 여부를 결정하기 위해 모션-적응 추론 모듈을 적용한다.
깊은 추론이 필요한 경우 O-YOLOv2를 실행하여 클래스 확률 맵을 업데이트하고 I_ref 및 참조 맵을 업데이트한다; 그렇지 않으면 참조 맵을 재사용한다.
최적화된 모델을 Pascal VOC 2007에서 평가하여 YOLOv2와의 매개변수 수 및 IOU를 비교한다; FPS 및 깊은 추론 빈도를 평가하기 위해 Nvidia Jetson TX1에서 비디오 런타임을 평가한다.

실험 결과

연구 질문

RQ1진화적 합성이 임베디드 디바이스에 적합한 컴팩트하고도 효과적인 YOLOv2 기반 네트워크(O-YOLOv2)를 생성할 수 있는가?
RQ2모션-적응 추론이 비디오 스트림에서 탐지 성능을 유지하면서 깊은 추론의 수와 전력 소비를 줄이는가?
RQ3YOLOv2와 비교하여 임베디드 플랫폼에 Fast YOLO를 배치했을 때 얻어지는 속도 향상과 자원 사용은 어떠한가?
RQ4표준 벤치마크에서 매개변수 수와 IOU 측면에서 O-YOLOv2는 YOLOv2와 어떻게 비교되는가?

주요 결과

네트워크 아키텍처	매개변수 수	IOU
YOLOv2	48.2M	67.2%
O-YOLOv2	17.1M	65.10%

O-YOLOv2는 YOLOv2에 비해 매개변수 수가 ~2.8배 작고 IOU 감소는 단지 ~2%(67.2% vs 65.10%)이다.
Fast YOLO는 평균적으로 깊은 추론을 약 38.13% 감소시키고 Jetson TX1에서 YOLOv2에 비해 약 ~3.3배 속도 향상을 달성한다(≈18 FPS).
Fast YOLO는 프레임당 평균 런타임을 184 ms(YOLOv2)에서 56 ms로 개선한다.
Pascal VOC 2007에서 O-YOLOv2는 훨씬 적은 매개변수로도 경쟁력 있는 탐지 성능을 유지한다.
이 프레임워크는 최적화된 아키텍처와 모션 인지 추론을 결합하여 전력 소비를 줄이고 실시간 임베디드 비디오 탐지를 가능하게 한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.