QUICK REVIEW

[论文解读] Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark

Floriana Ciaglia, Francesco Saverio Zuppichini|arXiv (Cornell University)|Nov 24, 2022

Advanced Neural Network Applications被引用 28

一句话总结

本论文推出 RF100，一个覆盖7个成像领域、包含100个数据集的多域对象检测基准，包含224,714张图像和805个类别，以及基线结果，用以评估在真实世界域上的泛化能力。

ABSTRACT

The evaluation of object detection models is usually performed by optimizing a single metric, e.g. mAP, on a fixed set of datasets, e.g. Microsoft COCO and Pascal VOC. Due to image retrieval and annotation costs, these datasets consist largely of images found on the web and do not represent many real-life domains that are being modelled in practice, e.g. satellite, microscopic and gaming, making it difficult to assert the degree of generalization learned by the model. We introduce the Roboflow-100 (RF100) consisting of 100 datasets, 7 imagery domains, 224,714 images, and 805 class labels with over 11,170 labelling hours. We derived RF100 from over 90,000 public datasets, 60 million public images that are actively being assembled and labelled by computer vision practitioners in the open on the web application Roboflow Universe. By releasing RF100, we aim to provide a semantically diverse, multi-domain benchmark of datasets to help researchers test their model's generalizability with real-life data. RF100 download and benchmark replication are available on GitHub.

研究动机与目标

提供一个来自 Roboflow Universe、具有语义多样性的多领域闭域对象检测数据集基准。
使研究人员能够测试模型在跨领域特定任务以及零-shot/少-shot设置下的泛化能力与迁移。
评估当前目标检测模型在超出 COCO 和 Pascal VOC 的多样域上的表现。

提出的方法

从七个语义类别（航拍、电子游戏、显微、水下、文档、电磁、真实世界）的100个数据集中整理 RF100。
将图像标准化为 640x640，并清理/合并类别标签以减少歧义。
报告每个类别的元数据（数据集、图像、类别）及使用 YOLOv5、YOLOv7 和 GLIP 的领域特定基线。
比较在各类别上的微调检测器与零样本检测器的性能，包括 GLIP 的提示重映射。
在 GitHub 上提供 RF100 的下载与复现资源。

实验结果

研究问题

RQ1一个多样且跨领域的基准是否能够揭示在标准对象检测基准（如 COCO）中未体现的泛化差距？
RQ2微调检测器（YOLOv5/YOLOv7）在不同域上与零样本检测器（GLIP）相比表现如何？
RQ3在 RF100 类别中，域特定的性能模式与挑战（如对象尺寸、类别数量）是什么？
RQ4RF100 数据集在语义上的聚类程度如何，以及这对任务之间的迁移有何影响？

主要发现

RF100 包含 100 个数据集、7 个域、224,714 张图像和 805 个类别，标注时长超过 11,170 小时。
按类别的结果显示跨域的模型性能存在差异，表明域相关的泛化差距。
YOLOv5 与 YOLOv7 在各类别上达到不同的平均 mAP@.50，真实世界与电子游戏通常得到比其他域更高的分数。
GLIP 这一零样本检测器在若干 RF100 类别上表现出明显更低的 mAP，凸显了开放词汇泛化对晦涩图像的局限性。
RF100 数据集通过 CLIP 嵌入展现出域语义的聚类，表明基准内存在域特定的结构。
作者在 GitHub 上提供 RF100 的下载与复现资源，以促进进一步研究。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。