Skip to main content
QUICK REVIEW

[論文レビュー] A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS

Juan Terven, Diana Cordova-Esparza|arXiv (Cornell University)|Apr 2, 2023
Advanced Neural Network Applications被引用数 185
ひとこと要約

本論文は YOLOv1 から YOLOv8 および YOLO-NAS までの YOLO オブジェクト検出器の進化を概説し、各バージョンにおけるアーキテクチャ、トレーニングの工夫、性能動向を詳述する。

ABSTRACT

YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications. We present a comprehensive analysis of YOLO's evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with Transformers. We start by describing the standard metrics and postprocessing; then, we discuss the major changes in network architecture and training tricks for each model. Finally, we summarize the essential lessons from YOLO's development and provide a perspective on its future, highlighting potential research directions to enhance real-time object detection systems.

研究の動機と目的

  • Summarize the evolution of the YOLO family from YOLOv1 to YOLOv8 and YOLO-NAS.
  • Detail architectural changes, training techniques, and performance trends across versions.
  • Discuss postprocessing, metrics, and the speed-accuracy tradeoffs in real-time detection.
  • Highlight future directions and research opportunities for real-time object detection.

提案手法

  • Discuss foundational concepts and metrics used in YOLO evaluation (AP, mAP, IoU, NMS).
  • Describe architectural changes across YOLO versions (backbone, neck, head, multi-scale predictions).
  • Summarize training tricks, loss functions, and data augmentation strategies introduced for each version.
  • Compare YOLO variants (YOLOv1–v4, YOLOv3 with spp, YOLO9000, YOLOv3/4/5-era evolutions) and related methods (YOLO with transformers, YOLO-NAS).
  • Provide a synthesized view of speed versus accuracy tradeoffs and practical considerations for real-time detection.
Figure 1: A timeline of YOLO versions.
Figure 1: A timeline of YOLO versions.

実験結果

リサーチクエスチョン

  • RQ1How did each YOLO version improve speed and accuracy relative to its predecessor?
  • RQ2What architectural and training changes drove improvements in detection performance across YOLOv1–YOLOv8 and YOLO-NAS?
  • RQ3What are the common post-processing and evaluation metrics used to compare YOLO variants across datasets (VOC/COCO)?
  • RQ4What future directions and research opportunities are identified for advancing real-time object detection with YOLO?
  • RQ5How do newer YOLO variants handle multi-scale detection and anchor/prior box strategies?

主な発見

  • YOLO evolution shows progressive improvements in both accuracy (AP/mAP) and speed through architectural refinements and training tricks.
  • YOLOv2 introduced anchors, higher-resolution training, and a fully convolutional design, boosting AP on VOC 2007 to 78.6%.
  • YOLOv3 added Darknet-53, multi-scale predictions, and residual connections, improving small object detection and achieving state-of-the-art COCO performance at the time.
  • YOLOv4 combined bag-of-freebies and bag-of-specials strategies, CSPDarknet53-PANet-SPP backbone, and advanced training techniques to reach strong real-time performance.
  • The review discusses YOLO-NAS and YOLO with transformers as future directions, emphasizing ongoing improvements in real-time detection for diverse applications.
Figure 2: Bibliometric network visualization of the main YOLO Applications created with [ VOSviewer_Visualizing_Scientific_Landscapes_2023 ] .
Figure 2: Bibliometric network visualization of the main YOLO Applications created with [ VOSviewer_Visualizing_Scientific_Landscapes_2023 ] .

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。