[論文レビュー] A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS
本論文は YOLOv1 から YOLOv8 および YOLO-NAS までの YOLO オブジェクト検出器の進化を概説し、各バージョンにおけるアーキテクチャ、トレーニングの工夫、性能動向を詳述する。
YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications. We present a comprehensive analysis of YOLO's evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with Transformers. We start by describing the standard metrics and postprocessing; then, we discuss the major changes in network architecture and training tricks for each model. Finally, we summarize the essential lessons from YOLO's development and provide a perspective on its future, highlighting potential research directions to enhance real-time object detection systems.
研究の動機と目的
- Summarize the evolution of the YOLO family from YOLOv1 to YOLOv8 and YOLO-NAS.
- Detail architectural changes, training techniques, and performance trends across versions.
- Discuss postprocessing, metrics, and the speed-accuracy tradeoffs in real-time detection.
- Highlight future directions and research opportunities for real-time object detection.
提案手法
- Discuss foundational concepts and metrics used in YOLO evaluation (AP, mAP, IoU, NMS).
- Describe architectural changes across YOLO versions (backbone, neck, head, multi-scale predictions).
- Summarize training tricks, loss functions, and data augmentation strategies introduced for each version.
- Compare YOLO variants (YOLOv1–v4, YOLOv3 with spp, YOLO9000, YOLOv3/4/5-era evolutions) and related methods (YOLO with transformers, YOLO-NAS).
- Provide a synthesized view of speed versus accuracy tradeoffs and practical considerations for real-time detection.

実験結果
リサーチクエスチョン
- RQ1How did each YOLO version improve speed and accuracy relative to its predecessor?
- RQ2What architectural and training changes drove improvements in detection performance across YOLOv1–YOLOv8 and YOLO-NAS?
- RQ3What are the common post-processing and evaluation metrics used to compare YOLO variants across datasets (VOC/COCO)?
- RQ4What future directions and research opportunities are identified for advancing real-time object detection with YOLO?
- RQ5How do newer YOLO variants handle multi-scale detection and anchor/prior box strategies?
主な発見
- YOLO evolution shows progressive improvements in both accuracy (AP/mAP) and speed through architectural refinements and training tricks.
- YOLOv2 introduced anchors, higher-resolution training, and a fully convolutional design, boosting AP on VOC 2007 to 78.6%.
- YOLOv3 added Darknet-53, multi-scale predictions, and residual connections, improving small object detection and achieving state-of-the-art COCO performance at the time.
- YOLOv4 combined bag-of-freebies and bag-of-specials strategies, CSPDarknet53-PANet-SPP backbone, and advanced training techniques to reach strong real-time performance.
- The review discusses YOLO-NAS and YOLO with transformers as future directions, emphasizing ongoing improvements in real-time detection for diverse applications.
![Figure 2: Bibliometric network visualization of the main YOLO Applications created with [ VOSviewer_Visualizing_Scientific_Landscapes_2023 ] .](https://ar5iv.labs.arxiv.org/html/2304.00501/assets/figures/yolo_apps_.png)
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。