Skip to main content
QUICK REVIEW

[論文レビュー] YOLOv1 to YOLOv11: A Comprehensive Survey of Real-Time Object Detection Innovations and Challenges

Manikanta Kotthapalli, Deepika Ravipati|ArXiv.org|Aug 4, 2025
Advanced Neural Network Applications被引用数 5
ひとこと要約

A comprehensive survey of the YOLO family from v1 through v9, with discussions of emerging v10/v11, architectural innovations, performance benchmarks, deployment, and open challenges.

ABSTRACT

Over the past decade, object detection has advanced significantly, with the YOLO (You Only Look Once) family of models transforming the landscape of real-time vision applications through unified, end-to-end detection frameworks. From YOLOv1's pioneering regression-based detection to the latest YOLOv9, each version has systematically enhanced the balance between speed, accuracy, and deployment efficiency through continuous architectural and algorithmic advancements.. Beyond core object detection, modern YOLO architectures have expanded to support tasks such as instance segmentation, pose estimation, object tracking, and domain-specific applications including medical imaging and industrial automation. This paper offers a comprehensive review of the YOLO family, highlighting architectural innovations, performance benchmarks, extended capabilities, and real-world use cases. We critically analyze the evolution of YOLO models and discuss emerging research directions that extend their impact across diverse computer vision domains.

研究の動機と目的

  • Survey the evolution of YOLO architectures from v1 to v9 (and brief notes on v10/v11).
  • Analyze how backbone, neck, head, loss, and training strategies shaped speed-accuracy on standard benchmarks.
  • Contextualize performance (mAP, FPS) on PASCAL VOC and COCO and discuss deployment on edge vs server.
  • Identify open challenges such as training stability, domain shift robustness, and interpretability, and propose future directions.

提案手法

  • Categorical taxonomy of YOLO innovations across five axes: Backbone, Neck, Detection Head, Loss and Assignment, Training Strategies.
  • Chronological review of YOLO versions v1–v9 with key architectural and training developments.
  • Performance benchmarking references including mAP and FPS on COCO and VOC datasets.
  • Discussion of deployment characteristics across edge and server environments for real-time tasks.
  • Compilation of open challenges and proposed future research directions.

実験結果

リサーチクエスチョン

  • RQ1What architectural changes across YOLO versions most improved the speed-accuracy trade-off?
  • RQ2How have training strategies and loss/assignment functions evolved to improve convergence and robustness?
  • RQ3What are the practical deployment implications (edge vs server) for each YOLO generation?
  • RQ4What open challenges remain in YOLO (training stability, domain shift, interpretability) and where should future work focus?
  • RQ5How do newer YOLO versions (v8–v9, with mentions of v10/v11) extend capabilities to segmentation, pose estimation, and multi-task learning?

主な発見

YOLO VersionBackboneAnchor TypeFeature FusionmAP@0.5 (COCO)Speed (FPS)Key Highlights
YOLOv1Custom CNNNoneNone63.4% (VOC)45最初の統一検出器
YOLOv2Darknet-19Anchor-basedNone76.8% (VOC), 21.6% (COCO)67YOLO9000, k-means, multi-scale
YOLOv3Darknet-53Anchor-basedMulti-scale57.9%30–45小 Objectsの検出性能向上
YOLOv4CSPDarknet-53Anchor-basedPANet + SPP43.5% (AP)62–65BoF/BoS, Mish, CutMix
YOLOv5CSPDarknet (PyTorch)AutoAnchorPANet50.1%60+Model scaling, exportability
YOLOv6EfficientRepNetHybridRepPAN52.5%70+Anchor-free option, decoupled head
YOLOv7E-ELANAnchor-basedPANet + E-ELAN56.8%60+RepConv, Coarse-to-fine head
YOLOv8C2f ModulesAnchor-freeFPN-style53.0%60–80Multi-task, modernized head
YOLOv9GELANAnchor-freeGELAN-FPN56.0%+50–60SimOTA, DFLv2, scalable variants
  • YOLO evolution shows steady gains in mAP and FPS across generations, with notable breakthroughs in backbone and neck designs (e.g., Darknet-53, CSPDarknet, PANet, GELAN).
  • Anchor-based to anchor-free transitions (v8) and decoupled heads (v6–v7) consistently improved localization and robustness, especially for small objects.
  • Advanced training strategies (Mosaic, CutMix, EMA, SimOTA, DFL v2) and re-parameterization enabled better convergence and deployment efficiency.
  • YOLOv4–v9 demonstrate state-of-the-art real-time detection performance on COCO, with significant edge deployment readiness and multi-task capabilities (segmentation, pose estimation).
  • The survey highlights open challenges such as training stability in anchor-free variants, robustness under domain shift, and interpretability, outlining directions for future research.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。