QUICK REVIEW

[论文解读] YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles

Aduen Benjumea, Izzeddin Teeti|arXiv (Cornell University)|Dec 22, 2021

Advanced Neural Network Applications被引用 117

一句话总结

YOLO-V5 基础检测器被修改以形成 YOLO-Z 家族，在推理时间成本适度的情况下提升小目标检测，在一个锥密集型的自主赛车数据集上进行了验证。

ABSTRACT

As autonomous vehicles and autonomous racing rise in popularity, so does the need for faster and more accurate detectors. While our naked eyes are able to extract contextual information almost instantly, even from far away, image resolution and computational resources limitations make detecting smaller objects (that is, objects that occupy a small pixel area in the input image) a genuinely challenging task for machines and a wide-open research field. This study explores how the popular YOLOv5 object detector can be modified to improve its performance in detecting smaller objects, with a particular application in autonomous racing. To achieve this, we investigate how replacing certain structural elements of the model (as well as their connections and other parameters) can affect performance and inference time. In doing so, we propose a series of models at different scales, which we name `YOLO-Z', and which display an improvement of up to 6.9% in mAP when detecting smaller objects at 50% IOU, at the cost of just a 3ms increase in inference time compared to the original YOLOv5. Our objective is to inform future research on the potential of adjusting a popular detector such as YOLOv5 to address specific tasks and provide insights on how specific changes can impact small object detection. Such findings, applied to the broader context of autonomous vehicles, could increase the amount of contextual information available to such systems.

研究动机与目标

提升 YOLOv5 在自动驾驶场景中的小目标检测性能。
研究对主干、颈部和连接的结构修改对小目标精度与速度的影响。
确定哪些结构改动在精度与实时推理之间达到最佳权衡。

提出的方法

在保留 YOLOv5 核心结构的同时，用 DenseNet 或 ResNet 替换或修改主干。
用简化的 FPN 或 BiFPN 替换颈部，以更好地传递小目标信息。
将连接重定向，使颈部/头部使用更高分辨率的特征图（包含/排他映射）。
通过数据驱动的自动生成对每个尺度的锚框进行调优（每尺度 3 或 5 个锚框）。
尝试输入尺度相关的调整（深度/宽度修饰符）和学习率变化，以观察对小目标检测的影响。

实验结果

研究问题

RQ1如何在不牺牲实时性能的前提下，对 YOLOv5 进行结构性修改以提升小目标检测？
RQ2在自动驾驶场景中，哪种主干、颈部和特征图路由配置能为小目标带来最佳提升？
RQ3锚框数量和更高分辨率特征图对 50% IOU 下小目标 mAP 的影响如何？

主要发现

YOLO-Z 模型在 50% IOU 的各尺度上平均在 mAP 上提升 2.7 个百分点，小对象的提升为 5.9 个百分点，额外推理时间约 2.6 ms。
DenseNet 主干相较于基线在小对象上提供稳定提升，额外时延约 ~3 ms；ResNet 往往表现较差且更慢。
使用专门增加的高分辨率特征图（XS_ex）及额外的小地图可提升小对象检测，特别是在密集小对象数据集上；效应随尺度而异。
锚框数量增加到每尺度 5 个对较大尺度收益更多，较小尺度可能更利于较少的锚框（每尺度 3 个）。
在较小尺度时，FPN 颈部通常优于 bi-FPN，而 X 尺度对颈部改动的收益较小。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。