QUICK REVIEW

[論文レビュー] YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles

Aduen Benjumea, Izzeddin Teeti|arXiv (Cornell University)|Dec 22, 2021

Advanced Neural Network Applications被引用数 117

ひとこと要約

YOLO-V5ベース検出器を改変してYOLO-Zファミリを形成し、小物体検出を推論時間コストを抑えつつ改善、コーン密度の自律レースデータセットで検証。

ABSTRACT

As autonomous vehicles and autonomous racing rise in popularity, so does the need for faster and more accurate detectors. While our naked eyes are able to extract contextual information almost instantly, even from far away, image resolution and computational resources limitations make detecting smaller objects (that is, objects that occupy a small pixel area in the input image) a genuinely challenging task for machines and a wide-open research field. This study explores how the popular YOLOv5 object detector can be modified to improve its performance in detecting smaller objects, with a particular application in autonomous racing. To achieve this, we investigate how replacing certain structural elements of the model (as well as their connections and other parameters) can affect performance and inference time. In doing so, we propose a series of models at different scales, which we name `YOLO-Z', and which display an improvement of up to 6.9% in mAP when detecting smaller objects at 50% IOU, at the cost of just a 3ms increase in inference time compared to the original YOLOv5. Our objective is to inform future research on the potential of adjusting a popular detector such as YOLOv5 to address specific tasks and provide insights on how specific changes can impact small object detection. Such findings, applied to the broader context of autonomous vehicles, could increase the amount of contextual information available to such systems.

研究の動機と目的

YOLOv5の自動運転車シナリオで小物体検出性能を向上させる。
バックボーン、ネック、接続に対する構造変更が小物体の精度と速度に与える影響を調査する。
精度とリアルタイム推論のトレードオフが最も良い変更を特定する。

提案手法

backboneをDenseNetまたはResNetに置換・修正し、YOLOv5のコア構造を保持する。
ネックを簡略化したFPNまたはBiFPNに置換し、小物体情報の伝搬を改善する。
連結をネック/ヘッドの高解像度特徴マップを使用する方向に再配分する（ inclusive/exclusive mappings）。
データ駆動の自動生成で各スケールのアンカーを調整する（1スケールあたり3または5アンカー）。
小物体検出への影響を見るために入力スケール関連の調整（深さ/幅の修正）と学習率の変動を試す。

実験結果

リサーチクエスチョン

RQ1YOLOv5をリアルタイム性能を損なうことなく小物体検出を改善するにはどう構造を変更すればよいか。
RQ2自動運転車の文脈で小物体に対して最良の利得を得るバックボーン、ネック、特徴マップルーティングの構成はどれか。
RQ3アンカー数と高解像度特徴マップが50% IOUでの小物体mAPに与える影響はどれくらいか。

主な発見

Scales	mAP .5 (YOLOv5)	mAP .5 (YOLO-Z)	Difference in mAP .5	mAP .5 small (YOLOv5)	mAP .5 small (YOLO-Z)	Difference in mAP .5 small	inference (YOLOv5, ms)	inference (YOLO-Z, ms)	Difference in inference (ms)
S	0.926	0.955	3.13%	0.869	0.925	6.44%	8	8.9	0.9
M	0.932	0.9605	3.06%	0.8795	0.9425	7.16%	11.6	14.3	2.7
L	0.935	0.964	3.10%	0.886	0.9545	7.73%	16.6	19.6	3.0
X	0.9385	0.9605	2.34%	0.8975	0.9465	5.46%	26.9	30.6	3.7

YOLO-Zモデルはスケール全体で50% IOUのmAPの絶対値で平均2.7ポイント、小物体で5.9ポイントの向上を達成し、推論时间は約2.6 ms追加。
DenseNetバックボーンは基準と比較して小物体で一貫した利得を示し、遅延は約3 ms程度にとどまる一方、ResNetは低パフォーマンスかつ遅くなる傾向。
Exclusively added higher-resolution feature map (XS_ex)を用い、追加の小型マップを組み合わせると小物体検出が向上し、密度の高い小物体データセットで効果がスケールごとに異なる。
アンカー数を増やす（スケールあたり5）は大きなスケールで利益を得られやすい一方、小さなスケールでは3など少数の方が有利な場合がある。
FPNネックは通常、スケールが小さい場合にbi-FPNよりも優れており、Xスケールはネック変更の恩恵を受けにくい。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。