QUICK REVIEW

[논문 리뷰] YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles

Aduen Benjumea, Izzeddin Teeti|arXiv (Cornell University)|2021. 12. 22.

Advanced Neural Network Applications인용 수 117

한 줄 요약

tldr: YOLO-V5 기본 탐지기들을 수정하여 YOLO-Z 계열을 형성하고, 작은 물체 탐지 성능을 개선하되 추론 시간 비용은 여유 있게 유지하며, cone-dense 자율 주행 레이싱 데이터셋에서 검증되었습니다.

ABSTRACT

As autonomous vehicles and autonomous racing rise in popularity, so does the need for faster and more accurate detectors. While our naked eyes are able to extract contextual information almost instantly, even from far away, image resolution and computational resources limitations make detecting smaller objects (that is, objects that occupy a small pixel area in the input image) a genuinely challenging task for machines and a wide-open research field. This study explores how the popular YOLOv5 object detector can be modified to improve its performance in detecting smaller objects, with a particular application in autonomous racing. To achieve this, we investigate how replacing certain structural elements of the model (as well as their connections and other parameters) can affect performance and inference time. In doing so, we propose a series of models at different scales, which we name `YOLO-Z', and which display an improvement of up to 6.9% in mAP when detecting smaller objects at 50% IOU, at the cost of just a 3ms increase in inference time compared to the original YOLOv5. Our objective is to inform future research on the potential of adjusting a popular detector such as YOLOv5 to address specific tasks and provide insights on how specific changes can impact small object detection. Such findings, applied to the broader context of autonomous vehicles, could increase the amount of contextual information available to such systems.

연구 동기 및 목표

자율 주행 차량 시나리오를 위한 YOLOv5의 소형 객체 탐지 성능 향상.
소형 객체 정확도와 속도에 영향을 주는 백본, 넥, 연결부의 구조적 수정 조사.
정확도와 실시간 추론 간 최적의 균형을 제공하는 아키텍처 변경 식별.

제안 방법

(core YOLOv5 구조를 유지하면서 백본을 DenseNet 또는 ResNet로 교체하거나 수정.
작은 객체 정보를 더 잘 전달하도록 넥을 단순화된 FPN 또는 BiFPN으로 교체.
연결 경로를 넥/헤드에서 더 높은 해상도 특징 맵을 사용하도록 재지정(포함적/배타적 매핑 포함).
데이터 기반 자동 생성으로 스케일당 앵커를 조정(스케일당 3개 또는 5개 앵커).
입력 스케일 관련 조정(깊이/너비 수정자) 및 학습률 변화의 효과를 관찰하여 소형 객체 탐지에 미치는 영향을 실험.

실험 결과

연구 질문

RQ1실시간 성능을 해치지 않으면서 YOLOv5를 구조적으로 어떻게 수정하여 소형 객체 탐지 성능을 향상시킬 수 있는가?
RQ2자율 주행 맥락에서 소형 객체에 가장 큰 이득을 주는 백본, 넥, 피처 맵 라우팅 구성은 무엇인가?
RQ3앵커 수와 더 높은 해상도 피처 맵이 50% IOU에서 소형 객체의 mAP에 미치는 영향은 무엇인가?

주요 결과

YOLO-Z 모델은 50% IOU에서 스케일 전체에 걸쳐 평균 2.7퍼센트 포인트의 절대 증가를, 소형 객체의 경우 5.9포인트를 달성하며 추론 시간이 약 2.6 ms 증가한다.
DenseNet 백본은 기초 대비 약 3 ms의 추가 지연으로 일관된 소형 객체 이득을 제공하는 반면, ResNet은 상대적으로 성능이 떨어지고 느리다.
추가 고해상도 피처 맵(XS_ex) 하나를 더하고 추가 소형 맵을 포함하면 소형 객체 탐지가 개선되며 특히 밀도 높은 소형 객체 데이터셋에서 효과가 scale에 따라 다르게 나타난다.
스케일당 앵커 수를 늘리면(5개) 더 큰 스케일에 이점이 크고, 소형 스케일은 더 적은 앵커(스케일당 3개)가 이점을 줄 수 있다.
FPN 넥은 일반적으로 작은 스케일에서 bi-FPN보다 성능이 우수하며, X 스케일은 넥 변경의 이점이 덜하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.