QUICK REVIEW

[論文レビュー] Improved YOLOv12 with LLM-Generated Synthetic Data for Enhanced Apple Detection and Benchmarking Against YOLOv11 and YOLOv10

Ranjan Sapkota, Manoj Karkee|ArXiv.org|Feb 26, 2025

Plant and Fungal Interactions Research被引用数 4

ひとこと要約

YOLOv12がLLMsによって生成された合成データで訓練され、YOLOv11およびYOLOv10を apple detectionで上回り、精度、再現率、mAP@50が向上、実地試験で実用性を検証。

ABSTRACT

This study evaluated the performance of the YOLOv12 object detection model, and compared against the performances YOLOv11 and YOLOv10 for apple detection in commercial orchards based on the model training completed entirely on synthetic images generated by Large Language Models (LLMs). The YOLOv12n configuration achieved the highest precision at 0.916, the highest recall at 0.969, and the highest mean Average Precision (mAP@50) at 0.978. In comparison, the YOLOv11 series was led by YOLO11x, which achieved the highest precision at 0.857, recall at 0.85, and mAP@50 at 0.91. For the YOLOv10 series, YOLOv10b and YOLOv10l both achieved the highest precision at 0.85, with YOLOv10n achieving the highest recall at 0.8 and mAP@50 at 0.89. These findings demonstrated that YOLOv12, when trained on realistic LLM-generated datasets surpassed its predecessors in key performance metrics. The technique also offered a cost-effective solution by reducing the need for extensive manual data collection in the agricultural field. In addition, this study compared the computational efficiency of all versions of YOLOv12, v11 and v10, where YOLOv11n reported the lowest inference time at 4.7 ms, compared to YOLOv12n's 5.6 ms and YOLOv10n's 5.9 ms. Although YOLOv12 is new and more accurate than YOLOv11, and YOLOv10, YOLO11n still stays the fastest YOLO model among YOLOv10, YOLOv11 and YOLOv12 series of models. (Index: YOLOv12, YOLOv11, YOLOv10, YOLOv13, YOLOv14, YOLOv15, YOLOE, YOLO Object detection)

研究の動機と目的

複雑な果樹園環境での堅牢なリンゴ検出を促進し、現場データ収集のコスト依存を低減する。
合成データを用いたYOLOv12の性能向上をYOLOv11およびYOLOv10と比較 evaluating。
農業自動化における実用途の実証のため、実地画像でモデルを検証する。

提案手法

LLMベースのパイプラインを用いてDALL·E 2とCLIP埋め込みで合成リンゴ園画像を生成する。
合成画像に注釈を付け、固定訓練ハイパーパラメータで4つのYOLOv12構成（n、s、m、l）を訓練する。
同一の合成データセットと標準指標（Precision、Recall、mAP@50）を用いてYOLOv12構成をYOLOv11およびYOLOv10と比較する。
モデル間のパラメータ数、GFLOPs、推論レイテンシなどの計算効率を評価する。
Kinect DKを搭載したロボットプラットフォームで実際の果樹園画像を推論して、一般化性能を現場で評価する。

実験結果

リサーチクエスチョン

RQ1YOLOv12をLLM生成の合成データのみで訓練して、YOLOv11およびYOLOv10をリンゴ検出指標で上回ることができるか。
RQ2どのYOLOv12構成が果樹園展開における精度と効率の最良のトレードオフを提供するか。
RQ3合成データで訓練したモデルは実際の現場のリンゴ園画像に一般化できるか。

主な発見

Model Configuration	Precision	Recall	mAP@50
YOLOv12n	0.916	0.969	0.978
YOLOv12s	0.898	0.956	0.974
YOLOv12m	0.898	0.956	0.974
YOLOv12l	0.898	0.956	0.974
YOLO11n	0.84	0.76	0.862
YOLO11s	0.874	0.826	0.909
YOLO11m	0.809	0.821	0.879
YOLO11l	0.836	0.877	0.866
YOLO11x	0.857	0.85	0.91
YOLOv10n	0.84	0.8	0.89
YOLOv10s	0.82	0.83	0.88
YOLOv10m	0.83	0.8	0.87
YOLOv10b	0.85	0.82	0.88
YOLOv10l	0.85	0.75	0.83
YOLOv10x	0.77	0.81	0.85

YOLOv12nが全構成の中で最高の指標を示し、Precision 0.916、Recall 0.969、mAP@50 0.978。
YOLOv12s/m/lはPrecision 0.898、Recall 0.956、mAP@50 0.974を達成。
YOLOv11xはYOLOv11系列のトップパフォーマーとしてPrecision 0.857、Recall 0.85、mAP@50 0.91を示し、YOLOv10nはPrecision 0.84、Recall 0.8、mAP@50 0.89を達成。
YOLOv11nが最速の推論4.7 ms、YOLOv12nは5.6 ms、YOLOv10nは5.9 msで、古いYOLOv11バリアントの方が高速であることを示唆。
YOLOv12nは最少パラメータ数2.556Mと6.3 GFLOPs、159層の畳み込み層を持ち、より軽量で効率的なアーキテクチャを示唆。
実画像を用いた現場試験では、YOLOv12が前任者を上回り、LLM生成の合成データが農業展開に実用的に有用であることを現場条件で確認。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。