QUICK REVIEW

[论文解读] Improved YOLOv12 with LLM-Generated Synthetic Data for Enhanced Apple Detection and Benchmarking Against YOLOv11 and YOLOv10

Ranjan Sapkota, Manoj Karkee|ArXiv.org|Feb 26, 2025

Plant and Fungal Interactions Research被引用 4

一句话总结

在人工智能生成的合成数据上训练的 YOLOv12 超越 YOLOv11 和 YOLOv10 的苹果检测，在精准率、召回率和 mAP@50 上表现更好，实地测试验证了实用性。

ABSTRACT

This study evaluated the performance of the YOLOv12 object detection model, and compared against the performances YOLOv11 and YOLOv10 for apple detection in commercial orchards based on the model training completed entirely on synthetic images generated by Large Language Models (LLMs). The YOLOv12n configuration achieved the highest precision at 0.916, the highest recall at 0.969, and the highest mean Average Precision (mAP@50) at 0.978. In comparison, the YOLOv11 series was led by YOLO11x, which achieved the highest precision at 0.857, recall at 0.85, and mAP@50 at 0.91. For the YOLOv10 series, YOLOv10b and YOLOv10l both achieved the highest precision at 0.85, with YOLOv10n achieving the highest recall at 0.8 and mAP@50 at 0.89. These findings demonstrated that YOLOv12, when trained on realistic LLM-generated datasets surpassed its predecessors in key performance metrics. The technique also offered a cost-effective solution by reducing the need for extensive manual data collection in the agricultural field. In addition, this study compared the computational efficiency of all versions of YOLOv12, v11 and v10, where YOLOv11n reported the lowest inference time at 4.7 ms, compared to YOLOv12n's 5.6 ms and YOLOv10n's 5.9 ms. Although YOLOv12 is new and more accurate than YOLOv11, and YOLOv10, YOLO11n still stays the fastest YOLO model among YOLOv10, YOLOv11 and YOLOv12 series of models. (Index: YOLOv12, YOLOv11, YOLOv10, YOLOv13, YOLOv14, YOLOv15, YOLOE, YOLO Object detection)

研究动机与目标

在复杂果园环境中推动稳健的苹果检测，同时减少高成本现场数据采集的依赖。
评估使用合成数据时，YOLOv12 相较于 YOLOv11 和 YOLOv10 的性能提升。
通过真实现场图像验证模型，以展示在农业自动化中的实际适用性。

提出的方法

使用基于 LLM 的管线结合 DALL·E 2 和 CLIP 嵌入生成合成的苹果园图像。
对合成图像进行标注，并在固定的训练超参数下训练四个 YOLOv12 配置（n、s、m、l）。
使用相同的合成数据集和标准指标（Precision、Recall、mAP@50）将 YOLOv12 配置与 YOLOv11 和 YOLOv10 进行对比。
评估模型的计算效率，包括参数量、GFLOPs，以及推理时延。
通过对摄于机器人平台并安装在 Kinect DK 的真实果园图像进行推理来进行现场测试，以评估泛化能力。

实验结果

研究问题

RQ1在仅使用 LLM 生成的合成数据的情况下，YOLOv12 能否在苹果检测指标上超过 YOLOv11 和 YOLOv10？
RQ2哪种 YOLOv12 配置在精度与效率之间提供最佳权衡，适合果园部署？
RQ3经过合成数据训练的模型是否能很好地泛化到真实现场的果园图像？

主要发现

Model Configuration	Precision	Recall	mAP@50
YOLOv12n	0.916	0.969	0.978
YOLOv12s	0.898	0.956	0.974
YOLOv12m	0.898	0.956	0.974
YOLOv12l	0.898	0.956	0.974
YOLO11n	0.84	0.76	0.862
YOLO11s	0.874	0.826	0.909
YOLO11m	0.809	0.821	0.879
YOLO11l	0.836	0.877	0.866
YOLO11x	0.857	0.85	0.91
YOLOv10n	0.84	0.8	0.89
YOLOv10s	0.82	0.83	0.88
YOLOv10m	0.83	0.8	0.87
YOLOv10b	0.85	0.82	0.88
YOLOv10l	0.85	0.75	0.83
YOLOv10x	0.77	0.81	0.85

YOLOv12n 在所有配置中表现最佳，指标为 Precision 0.916、Recall 0.969、mAP@50 0.978。
YOLOv12s/m/l 的 Precision 为 0.898、Recall 为 0.956、mAP@50 为 0.974。
YOLOv11x 在 YOLOv11 系列中表现最佳，Precision 0.857、Recall 0.85、mAP@50 0.91；YOLOv10n 在 YOLOv10 系列中达到 Precision 0.84、Recall 0.8、mAP@50 0.89。
YOLOv11n 推理最快，为 4.7 ms；YOLOv12n 为 5.6 ms，YOLOv10n 为 5.9 ms，表明较旧的 YOLOv11 变体具有更快的推理时间。
YOLOv12n 使用最少的参数（2.556M）和 6.3 GFLOPs，具备 159 层卷积，表明架构更精简高效。
真实图像的现场测试显示 YOLOv12 在现场条件下优于前代，证实了基于 LLm 生成的合成数据在农业部署上的实际可行性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。