QUICK REVIEW

[論文レビュー] A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS

Juan Terven, Diana Cordova-Esparza|arXiv (Cornell University)|Apr 2, 2023

Advanced Neural Network Applications被引用数 185

ひとこと要約

本論文は YOLOv1 から YOLOv8 および YOLO-NAS までの YOLO オブジェクト検出器の進化を概説し、各バージョンにおけるアーキテクチャ、トレーニングの工夫、性能動向を詳述する。

ABSTRACT

YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications. We present a comprehensive analysis of YOLO's evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with Transformers. We start by describing the standard metrics and postprocessing; then, we discuss the major changes in network architecture and training tricks for each model. Finally, we summarize the essential lessons from YOLO's development and provide a perspective on its future, highlighting potential research directions to enhance real-time object detection systems.

研究の動機と目的

Summarize the evolution of the YOLO family from YOLOv1 to YOLOv8 and YOLO-NAS.
Detail architectural changes, training techniques, and performance trends across versions.
Discuss postprocessing, metrics, and the speed-accuracy tradeoffs in real-time detection.
Highlight future directions and research opportunities for real-time object detection.

提案手法

Discuss foundational concepts and metrics used in YOLO evaluation (AP, mAP, IoU, NMS).
Describe architectural changes across YOLO versions (backbone, neck, head, multi-scale predictions).
Summarize training tricks, loss functions, and data augmentation strategies introduced for each version.
Compare YOLO variants (YOLOv1–v4, YOLOv3 with spp, YOLO9000, YOLOv3/4/5-era evolutions) and related methods (YOLO with transformers, YOLO-NAS).
Provide a synthesized view of speed versus accuracy tradeoffs and practical considerations for real-time detection.

実験結果

リサーチクエスチョン

RQ1How did each YOLO version improve speed and accuracy relative to its predecessor?
RQ2What architectural and training changes drove improvements in detection performance across YOLOv1–YOLOv8 and YOLO-NAS?
RQ3What are the common post-processing and evaluation metrics used to compare YOLO variants across datasets (VOC/COCO)?
RQ4What future directions and research opportunities are identified for advancing real-time object detection with YOLO?
RQ5How do newer YOLO variants handle multi-scale detection and anchor/prior box strategies?

主な発見

YOLO evolution shows progressive improvements in both accuracy (AP/mAP) and speed through architectural refinements and training tricks.
YOLOv2 introduced anchors, higher-resolution training, and a fully convolutional design, boosting AP on VOC 2007 to 78.6%.
YOLOv3 added Darknet-53, multi-scale predictions, and residual connections, improving small object detection and achieving state-of-the-art COCO performance at the time.
YOLOv4 combined bag-of-freebies and bag-of-specials strategies, CSPDarknet53-PANet-SPP backbone, and advanced training techniques to reach strong real-time performance.
The review discusses YOLO-NAS and YOLO with transformers as future directions, emphasizing ongoing improvements in real-time detection for diverse applications.

Figure 2: Bibliometric network visualization of the main YOLO Applications created with [ VOSviewer_Visualizing_Scientific_Landscapes_2023 ] .

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。