QUICK REVIEW

[論文レビュー] Real-Time Detection and Analysis of Vehicles and Pedestrians using Deep Learning

Md Nahid Sadik, Tahmim Hossain|arXiv (Cornell University)|Apr 11, 2024

Advanced Neural Network Applications被引用数 6

ひとこと要約

本論文は都市風データ上でYOLOv8とRT-DETRモデルを用いたリアルタイムの車両・歩行者検出を比較し、YOLOv8mを最強のベースラインとし、データ拡張後はYOLOv8lがそれに次ぐこと、RT-DETRの派生モデルも競争力があると結論づけている。

ABSTRACT

Computer vision, particularly vehicle and pedestrian identification is critical to the evolution of autonomous driving, artificial intelligence, and video surveillance. Current traffic monitoring systems confront major difficulty in recognizing small objects and pedestrians effectively in real-time, posing a serious risk to public safety and contributing to traffic inefficiency. Recognizing these difficulties, our project focuses on the creation and validation of an advanced deep-learning framework capable of processing complex visual input for precise, real-time recognition of cars and people in a variety of environmental situations. On a dataset representing complicated urban settings, we trained and evaluated different versions of the YOLOv8 and RT-DETR models. The YOLOv8 Large version proved to be the most effective, especially in pedestrian recognition, with great precision and robustness. The results, which include Mean Average Precision and recall rates, demonstrate the model's ability to dramatically improve traffic monitoring and safety. This study makes an important addition to real-time, reliable detection in computer vision, establishing new benchmarks for traffic management systems.

研究の動機と目的

多様な都市環境でのリアルタイム車両・歩行者検出のための高速で高精度な深層学習フレームワークを開発する。
交通データ上で単段検出器（YOLOv8系）とトランスフォーマーベース検出器（RT-DETR）を評価する。
ベースラインデータセットと拡張データセットの両方で性能を評価し、歩行者に焦点を当てた結果を検討する。

提案手法

公開交通動画からフレームをサンプリングし、6クラスにまたがる1,142画像・3,388アノテーションのデータセットを作成する。
色相・露出・ノイズ・シアー・彩度・ぼかしを含むデータ拡張を行い、データセットを3,082画像に拡張する。
ベースラインおよび拡張データセット上でYOLOv8（s, m, l, x）とRT-DETR（L, x）モデルを比較する。
訓練: バッチサイズ8、640x640の画像、NVIDIA Tesla P1000上で最大100エポック; 評価はmAP、精度、再現率を用いる。

Figure 1 : Augmented Images from our dataset with Ground Truth bounding boxes.

実験結果

リサーチクエスチョン

RQ1どのモデル（YOLOv8対 RT-DETR）が、リアルタイムの車両・歩行者検出における精度と速度の最良のトレードオフを提供するか？
RQ2データ拡張は、モデル間およびクラス（車両対歩行者）間の検出性能にどのように影響するか？
RQ3クラスごとの検出能力は何か、ベースラインおよび拡張条件下で歩行者検出は車両検出とどう比較されるか？

主な発見

Model	mAP	Precision	Recall
YOLOv8s	0.798	0.814	0.764
YOLOv8m	0.898	0.861	0.867
YOLOv8l	0.898	0.877	0.835
YOLOv8x	0.818	0.895	0.735
RT-DETR-L	0.867	0.871	0.832
RT-DETR-x	0.878	0.886	0.850

YOLOv8m はベースラインで最高のmAP 0.898、精度0.861、再現率0.867を達成。
RT-DETR-x はベースラインで競争力のあるmAP（0.878）と高い精度（0.886）・再現率（0.850）を示す。
拡張データでは、YOLOv8l がトップの mAP 0.909、精度 0.884、再現率 0.861 を達成。
拡張データでの歩行者検出: YOLOv8l は mAP 0.822、精度 0.909、再現率 0.687（モデルにより異なる）を達成。
推論時の速度: YOLOv8 系は 40 FPS、RT-DETR は 25 FPS、リアルタイム交通監視に適していることを示す。

Figure 2 : Row 1: YOLOv8l output images after running the model.Row 2: RT-DETR-L output images after running model.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。