QUICK REVIEW

[論文レビュー] YOLOv10: Real-Time End-to-End Object Detection

Ao Wang, Hui Chen|arXiv (Cornell University)|May 23, 2024

Industrial Vision Systems and Defect Detection被引用数 1,009

ひとこと要約

YOLOv10は一貫したデュアル割り当てと全体最適化された効率-精度設計によるNMS不要のトレーニングを導入し、モデル規模を超えてエンドツーエンドのリアルタイム物体検出で最先端を達成する。

ABSTRACT

Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YOLOs, achieving notable progress. However, the reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs and adversely impacts the inference latency. Besides, the design of various components in YOLOs lacks the comprehensive and thorough inspection, resulting in noticeable computational redundancy and limiting the model's capability. It renders the suboptimal efficiency, along with considerable potential for performance improvements. In this work, we aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and model architecture. To this end, we first present the consistent dual assignments for NMS-free training of YOLOs, which brings competitive performance and low inference latency simultaneously. Moreover, we introduce the holistic efficiency-accuracy driven model design strategy for YOLOs. We comprehensively optimize various components of YOLOs from both efficiency and accuracy perspectives, which greatly reduces the computational overhead and enhances the capability. The outcome of our effort is a new generation of YOLO series for real-time end-to-end object detection, dubbed YOLOv10. Extensive experiments show that YOLOv10 achieves state-of-the-art performance and efficiency across various model scales. For example, our YOLOv10-S is 1.8$ imes$ faster than RT-DETR-R18 under the similar AP on COCO, meanwhile enjoying 2.8$ imes$ smaller number of parameters and FLOPs. Compared with YOLOv9-C, YOLOv10-B has 46\% less latency and 25\% fewer parameters for the same performance.

研究の動機と目的

NMS後処理を取り除くことでYOLOのエンドツーエンドのリアルタイム物体検出の境界を拡張する。
NMSフリー推論のための一貫したデュアルアサインメント訓練スキームを開発する。
YOLOの要素を効率と精度の両立のために全体的に最適化する。
COCOデータセット上でモデル規模別の最先端の遅延-精度トレードオフを実證する。

提案手法

リッチな監視のための多数-to-一のデュアルラベルヘッドを備えたNMSフリー訓練の一貫したデュアルアサインメントを提案する（訓練時は一対多、推論時は一対一）。
監視を整合させるために一対一と一対多のアサインメントを結ぶ一貫したマッチング指標を導入する。
軽量な分類ヘッド、空間-チャネル分離ダウンサンプリング、ランク主導ブロック設計を含む、全体的な効率-精度志向のモデル設計を実装する。
大カーネル畳み込みと部分セルフアテンション(PSA)モジュールによる精度志向の設計を探索し、低コストで性能を向上させる。
ランクベースの分析を用いて冗長な箇所にコンパクトブロック(CIB)を配置し、モデル規模に応じて大カーネルとPSAを選択的に適用する。

Figure 1: Comparisons with others in terms of latency-accuracy (left) and size-accuracy (right) trade-offs. We measure the end-to-end latency using the official pre-trained models.

実験結果

リサーチクエスチョン

RQ1エンドツーエンドのNMSフリーヨロが、推論遅延を低減しつつAPでNMSベースのYOLOに匹敵または上回ることができるか？
RQ2デュアルラベルアサインメントと統一マッチング指標をどのように活用してヘッド間の監督を整合させ、訓練効率を向上させるか？
RQ3モデル規模を超えてより良い効率-精度のトレードオフを生み出す全体的なアーキテクチャの変更は何か？
RQ4大カーネル畳み込みとPSAはリアルタイム検出器で過度なコストなしに利得をもたらすか？
RQ5ランク指向ブロック設計とデカップルドダウンサンプリングは性能を損なうことなく冗長性を低減できるか？

主な発見

YOLOv10はCOCOデータセット上でモデル規模を超えて最先端の遅延-精度トレードオフを達成する。
YOLOv10-SはAPが類似しつつ、RT-DETR-R18より1.8倍速く、パラメータとFLOPsは2.8倍少ない。
YOLOv10-BはYOLOv9-Cと比較して同等の性能でレイテンシ46%低く、パラメータは25%少ない。
YOLOv10-LおよびYOLOv10-XはYOLOv8-L/Xを0.3–1.0 AP上回り、はるかに少ないパラメータで（0.5–2.3x）上回る。
YOLOv10-N/Sは軽量モデルにおいてYOLOv6-3.0-N/SおよびRT-DETRのベースラインを遅延とAPで上回る；エンドツーエンドの遅延削減は一部のベースラインで約70%に達する。

Figure 2: (a) Consistent dual assignments for NMS-free training. (b) Frequency of one-to-one assignments in Top-1/5/10 of one-to-many results for YOLOv8-S which employs $\alpha_{o2m}$ =0.5 and $\beta_{o2m}$ =6 by default [ 20 ] . For consistency, $\alpha_{o2o}$ =0.5; $\beta_{o2o}$ =6. For inconsiste

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。