QUICK REVIEW

[논문 리뷰] Improved YOLOv12 with LLM-Generated Synthetic Data for Enhanced Apple Detection and Benchmarking Against YOLOv11 and YOLOv10

Ranjan Sapkota, Manoj Karkee|ArXiv.org|2025. 02. 26.

Plant and Fungal Interactions Research인용 수 4

한 줄 요약

LLM이 생성한 합성 데이터로 학습된 YOLOv12가 사과 탐지에서 YOLOv11 및 YOLOv10을 능가하여 정밀도, 재현율, 및 mAP@50를 더 높게 달성하고, 현장 테스트가 실제성을 검증한다.

ABSTRACT

This study evaluated the performance of the YOLOv12 object detection model, and compared against the performances YOLOv11 and YOLOv10 for apple detection in commercial orchards based on the model training completed entirely on synthetic images generated by Large Language Models (LLMs). The YOLOv12n configuration achieved the highest precision at 0.916, the highest recall at 0.969, and the highest mean Average Precision (mAP@50) at 0.978. In comparison, the YOLOv11 series was led by YOLO11x, which achieved the highest precision at 0.857, recall at 0.85, and mAP@50 at 0.91. For the YOLOv10 series, YOLOv10b and YOLOv10l both achieved the highest precision at 0.85, with YOLOv10n achieving the highest recall at 0.8 and mAP@50 at 0.89. These findings demonstrated that YOLOv12, when trained on realistic LLM-generated datasets surpassed its predecessors in key performance metrics. The technique also offered a cost-effective solution by reducing the need for extensive manual data collection in the agricultural field. In addition, this study compared the computational efficiency of all versions of YOLOv12, v11 and v10, where YOLOv11n reported the lowest inference time at 4.7 ms, compared to YOLOv12n's 5.6 ms and YOLOv10n's 5.9 ms. Although YOLOv12 is new and more accurate than YOLOv11, and YOLOv10, YOLO11n still stays the fastest YOLO model among YOLOv10, YOLOv11 and YOLOv12 series of models. (Index: YOLOv12, YOLOv11, YOLOv10, YOLOv13, YOLOv14, YOLOv15, YOLOE, YOLO Object detection)

연구 동기 및 목표

비용이 많이 드는 현장 데이터 수집에 대한 의존도를 줄이면서 복잡한 과수원 환경에서 견고한 사과 탐지를 촉진한다.
합성 데이터를 사용하여 YOLOv12가 YOLOv11 및 YOLOv10에 비해 성능 향상을 평가한다.
농업 자동화에서의 실용적 적용 가능성을 보여주기 위해 실제 현장 이미지로 모델을 검증한다.

제안 방법

DALL·E 2와 CLIP 임베딩을 활용한 LLM 기반 파이프라인으로 합성 사과 과수원 이미지를 생성한다.
합성 이미지에 주석을 달고 고정된 학습 하이퍼파라미터로 네 가지 YOLOv12 구성(n, s, m, l)을 학습시킨다.
동일한 합성 데이터 세트와 표준 지표(Precision, Recall, mAP@50)를 사용하여 YOLOv12 구성과 YOLOv11 및 YOLOv10을 비교한다.
매개변수, GFLOPs, 추론 지연시간 등 모델 간 계산 효율성을 평가한다.
로봇 플랫폼에 장착된 Kinect DK로 촬영된 실제 과수원 이미지에 대한 추론으로 일반화 성능을 평가하는 현장 테스트를 수행한다.

실험 결과

연구 질문

RQ1LLM생성 합성 데이터만으로 학습된 YOLOv12가 사과 탐지 지표에서 YOLOv11 및 YOLOv10을 능가할 수 있는가?
RQ2과수원 배치를 위한 정확도와 효율성 간 최상의 트레이드오프를 제공하는 YOLOv12 구성은 어느 것인가?
RQ3합성 데이터로 학습된 모델이 실제 현장 과수원 이미지에 잘 일반화되는가?

주요 결과

모델 구성	정밀도	재현율	mAP@50
YOLOv12n	0.916	0.969	0.978
YOLOv12s	0.898	0.956	0.974
YOLOv12m	0.898	0.956	0.974
YOLOv12l	0.898	0.956	0.974
YOLO11n	0.84	0.76	0.862
YOLO11s	0.874	0.826	0.909
YOLO11m	0.809	0.821	0.879
YOLO11l	0.836	0.877	0.866
YOLO11x	0.857	0.85	0.91
YOLOv10n	0.84	0.8	0.89
YOLOv10s	0.82	0.83	0.88
YOLOv10m	0.83	0.8	0.87
YOLOv10b	0.85	0.82	0.88
YOLOv10l	0.85	0.75	0.83
YOLOv10x	0.77	0.81	0.85

YOLOv12n은 모든 구성 중 가장 높은 지표를 달성하며 Precision 0.916, Recall 0.969, 및 mAP@50 0.978를 기록했다.
YOLOv12s/m/l은 Precision 0.898, Recall 0.956, 및 mAP@50 0.974를 달성했다.
YOLOv11x는 YOLOv11 시리즈의 최고 성능으로 Precision 0.857, Recall 0.85, 및 mAP@50 0.91를 산출; YOLOv10n은 YOLOv10 시리즈에서 Precision 0.84, Recall 0.8, 및 mAP@50 0.89를 달성했다.
YOLOv11n이 4.7 ms로 추론 속도가 가장 빠르고, YOLOv12n은 5.6 ms, YOLOv10n은 5.9 ms로 더 오래 걸려, 더 오래된 YOLOv11 변형이 더 빠른 시간을 보임을 나타낸다.
YOLOv12n은 가장 적은 매개변수(2.556M)와 6.3 GFLOPs, 159개의 합성곱 계층을 사용하여 더 간결하고 효율적인 아키텍처를 제시한다.
현장 이미지로 수행한 현장 테스트에서 YOLOv12가 현장 조건에서 선행 모델을 능가하는 것으로 나타나 LLm생성 합성 데이터의 농업 배치에 대한 실용적 타당성을 확인했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.