QUICK REVIEW

[論文レビュー] YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions

Nidhal Jegham, Chan Young Koh|arXiv (Cornell University)|Oct 31, 2024

Advanced Neural Network Applications被引用数 46

ひとこと要約

Ultralytics YOLO モデルを v3 から YOLO11 までの包括的なベンチマークを3つのデータセットで比較し、精度、速度、GFLOPs、モデルサイズを詳述してモデル選択を導く。

ABSTRACT

This study presents a comprehensive benchmark analysis of various YOLO (You Only Look Once) algorithms. It represents the first comprehensive experimental evaluation of YOLOv3 to the latest version, YOLOv12, on various object detection challenges. The challenges considered include varying object sizes, diverse aspect ratios, and small-sized objects of a single class, ensuring a comprehensive assessment across datasets with distinct challenges. To ensure a robust evaluation, we employ a comprehensive set of metrics, including Precision, Recall, Mean Average Precision (mAP), Processing Time, GFLOPs count, and Model Size. Our analysis highlights the distinctive strengths and limitations of each YOLO version. For example: YOLOv9 demonstrates substantial accuracy but struggles with detecting small objects and efficiency whereas YOLOv10 exhibits relatively lower accuracy due to architectural choices that affect its performance in overlapping object detection but excels in speed and efficiency. Additionally, the YOLO11 family consistently shows superior performance maintaining a remarkable balance of accuracy and efficiency. However, YOLOv12 delivered underwhelming results, with its complex architecture introducing computational overhead without significant performance gains. These results provide critical insights for both industry and academia, facilitating the selection of the most suitable YOLO algorithm for diverse applications and guiding future enhancements.

研究の動機と目的

YOLO バリアント v3 から YOLO11 の diverse datasets（交通標識、アフリカ野生動物、船舶）での性能を評価する。
mAP を超える複数の指標を評価する。精度、Recall、前処理、推論、後処理の時間、GFLOPs、モデルサイズを含む。
バージョン間の違いを説明するためのアーキテクチャの進化を分析する。

提案手法

一貫したハイパーパラメータを用いて 5 YOLO バージョンの 23 モデルをベンチマークする。
物体サイズとアスペクト比が異なる3つのデータセットで評価する。
前処理時間、推論時間、後処理時間、mAP50、mAP50-95、GFLOPs、モデルサイズを測定する。
適用可能な場合には Ultralytics 実装と元の YOLO 対応モデルを比較する。
アーキテクチャの変化（C2PSA、C3k2、アンカーなし手法、NMS なしトレーニング）とそれらが性能に与える影響を議論する。

Figure 1: Evolution of YOLO Algorithms throughout the years.

実験結果

リサーチクエスチョン

RQ1YOLO11 およびその前身は、多様なデータセットで精度、速度、効率の点でどのように比較されるか？
RQ2YOLO バージョン間のどのアーキテクチャの変更が観察される性能差を生み出しているか？
RQ3各バージョンにおける精度（mAP）と効率性（GFLOPs、モデルサイズ）との間にはどんなトレードオフが存在するか？

主な発見

Version	精度	再現率	mAP50	mAP50-95	前処理時間 (s)	推論時間 (s)	後処理時間 (s)	総時間 (s)	GFLOPs	サイズ (MB)
YOLOv3u	0.75	0.849	0.874	0.781	0.7	8.5	0.4	9.6	207.86	282.4
YOLOV3u tiny	0.845	0.667	0.772	0.682	1.4	0.7	0.3	2.4	24.44	19
YOLOv5un	0.805	0.679	0.749	0.665	0.6	6.6	0.4	7.6	5.65	7.1
YOLOv5us	0.85	0.777	0.827	0.744	0.5	7.8	0.4	8.7	18.58	23.9
YOLOv5um	0.849	0.701	0.83	0.744	1.1	9.5	0.4	11	50.54	64.1
YOLOv5ul	0.831	0.836	0.886	0.799	0.6	9.7	0.4	10.7	106.85	134.9
YOLOv5ux	0.863	0.795	0.867	0.777	1.1	9.8	0.4	11.3	195.2	246.3
YOLOv8n	0.749	0.688	0.777	0.689	0.6	6.8	0.4	7.8	6.55	8.1
YOLOv8s	0.766	0.788	0.806	0.718	0.6	7.8	0.4	8.8	22.59	28.6
YOLOv8m	0.838	0.805	0.845	0.763	1.6	9.1	0.4	11.1	52.12	78.9
YOLOv8l	0.771	0.789	0.853	0.767	0.6	9.2	0.4	10.2	87.77	165
YOLOv8x	0.902	0.744	0.874	0.78	0.6	9.4	0.4	10.4	136.9	257.7
YOLOv9t	0.792	0.748	0.812	0.731	0.5	10	0.4	10.9	4.93	7.7
YOLOv9s	0.763	0.81	0.828	0.75	0.6	11.1	0.4	12.1	15.33	26.8
YOLOv9m	0.864	0.796	0.864	0.784	1	12.1	0.4	13.5	40.98	76.7
YOLOv9c	0.827	0.807	0.852	0.769	1.3	11.6	0.4	13.3	51.8	102.6
YOLOv9e	0.819	0.824	0.854	0.764	0.8	16.1	0.4	17.3	117.5	189.4
YOLOv10n	0.722	0.602	0.722	0.64	1	0.8	0.2	2	5.59	8.3
YOLOv10s	0.823	0.742	0.834	0.744	1.2	1.1	0.2	2.5	15.9	24.7
YOLOv10m	0.834	0.843	0.88	0.781	1.2	2.4	0.2	3.8	32.1	63.8
YOLOv10b	0.836	0.764	0.859	0.765	1	3.1	0.2	4.3	39.7	98.4
YOLOv10l	0.873	0.807	0.866	0.771	1.1	3.8	0.2	5.1	50	126.8
YOLOv10x	0.773	0.854	0.88	0.787	1	6.3	0.2	7.5	61.4	170.4
YOLO11n	0.768	0.695	0.757	0.668	1.2	0.6	0.4	2.2	5.35	6.4
YOLO11s	0.819	0.758	0.838	0.742	1.2	1	0.4	2.6	18.4	21.4
YOLO11m	0.898	0.826	0.893	0.795	1.2	2.4	0.4	4	38.8	67.9
YOLO11l	0.862	0.839	0.889	0.794	1.2	3	0.4	4.6	49	86.8
YOLO11x	0.819	0.816	0.885	0.784	0.9	6.1	0.4	7.4	109	194.8

YOLO11 ファミリーは、データセット全体で精度、速度、効率、モデルサイズの点で優れた性能を示す。
YOLO11m は 2.4 ms の平均推論時間で、交通標識の mAP50-95 が 0.795、アフリカ野生動物が 0.81、船舶が 0.325、サイズは平均で 38.8 MB。
YOLOv9 は高精度だが小型物体検出と効率性で苦戦する一方、YOLOv10 は推論速度と効率性を重視し、オーバーラッピング物体検出を支援するアーキテクチャの選択を採用している。
Ultralytics 対応の YOLOv3u、YOLOv5un、YOLOv5us、YOLOv5ul、YOLOv8x、YOLOv9m/e、YOLOv10l/x、YOLO11 バリアントはトレードオフが異なって示され、オリジナル版との直接比較は最適化の影響で公平でない場合がある。
本研究は同じハイパーパラメータを用い、安定性のために Ultralytics 対応モデルに焦点を当てることで公正なベンチマークを提供する。

Figure 2: YOLOv3 architecture showcasing the residual blocks and the upsampling layers to enhance object detection efficiency through different scales [ 9 ] .

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。