[論文レビュー] Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark
本論文はRF100を提案する。100個のデータセットからなるマルチドメインの物体検出ベンチマークで、7つの画像ドメインを横断し、画像数224,714、クラス数805、さらには実世界ドメイン間の一般化を評価するベースライン結果を含む。
The evaluation of object detection models is usually performed by optimizing a single metric, e.g. mAP, on a fixed set of datasets, e.g. Microsoft COCO and Pascal VOC. Due to image retrieval and annotation costs, these datasets consist largely of images found on the web and do not represent many real-life domains that are being modelled in practice, e.g. satellite, microscopic and gaming, making it difficult to assert the degree of generalization learned by the model. We introduce the Roboflow-100 (RF100) consisting of 100 datasets, 7 imagery domains, 224,714 images, and 805 class labels with over 11,170 labelling hours. We derived RF100 from over 90,000 public datasets, 60 million public images that are actively being assembled and labelled by computer vision practitioners in the open on the web application Roboflow Universe. By releasing RF100, we aim to provide a semantically diverse, multi-domain benchmark of datasets to help researchers test their model's generalizability with real-life data. RF100 download and benchmark replication are available on GitHub.
研究の動機と目的
- Provide a semantically diverse, multi-domain benchmark of closed-domain OD datasets gathered from Roboflow Universe.
- Enable researchers to test model generalizability and transfer across domain-specific tasks and zero-/few-shot settings.
- Assess how current object detectors perform across varied domains beyond COCO and Pascal VOC.
提案手法
- curate RF100 from 100 datasets across seven semantic categories (Aerial, Videogames, Microscopic, Underwater, Documents, Electromagnetic, Real World).
- standardize images to 640x640 and clean/merge class labels to reduce ambiguity.
- report per-category metadata (datasets, images, classes) and domain-specific baselines using YOLOv5, YOLOv7, and GLIP.
- compare finetuned and zero-shot detector performance across categories, including prompt remapping for GLIP.
- provide RF100 download and replication resources on GitHub.
実験結果
リサーチクエスチョン
- RQ1Can a diverse, multi-domain benchmark reveal generalization gaps not evident in standard OD benchmarks (e.g., COCO)?
- RQ2How do finetuned detectors (YOLOv5/YOLOv7) compare with zero-shot detectors (GLIP) across distinct domains?
- RQ3What are the domain-specific performance patterns and challenges (e.g., object size, class count) across RF100 categories?
- RQ4To what extent do RF100 datasets cluster semantically, and how does that affect transfer between tasks?
主な発見
- RF100 comprises 100 datasets, 7 domains, 224,714 images, and 805 classes with over 11,170 labeling hours.
- Per-category results show variation in model performance across domains, indicating domain-dependent generalization gaps.
- YOLOv5 and YOLOv7 achieve differing average mAP@.50 across categories, with Real World and Videogames often yielding higher scores than some other domains.
- GLIP, a zero-shot detector, exhibits much lower mAP on several RF100 categories, highlighting limits of open-vocabulary generalization to obscure imagery.
- RF100 datasets demonstrate clustering of domain semantics via CLIP embeddings, suggesting domain-specific structure within the benchmark.
- The authors provide RF100 for download and replication on GitHub to facilitate further research.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。