QUICK REVIEW

[论文解读] YOLO9000: Better, Faster, Stronger

Joseph Redmon, Ali Farhadi|arXiv (Cornell University)|Dec 25, 2016

Advanced Neural Network Applications参考文献 13被引用 435

一句话总结

YOLO9000 通过将检测数据与大规模分类数据结合起来，利用 WordTree 分层标签方案，联合训练一个可实时检测超过 9000 个对象类别的检测器。

ABSTRACT

We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like Faster RCNN with ResNet and SSD while still running significantly faster. Finally we propose a method to jointly train on object detection and classification. Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset. Our joint training allows YOLO9000 to predict detections for object classes that don't have labelled detection data. We validate our approach on the ImageNet detection task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 gets 16.0 mAP. But YOLO can detect more than just 200 classes; it predicts detections for more than 9000 different object categories. And it still runs in real-time.

研究动机与目标

在保持速度的同时，改进 YOLO 以实现更高的召回率和定位精度。
开发一种方法，使用检测数据和分类数据来训练检测器。
创建一个可扩展的标签空间，使检测超越一小组固定类别的限制。

提出的方法

将 YOLO 提升到 YOLOv2，使用批量归一化、高分辨率分类器、锚框和尺寸先验。
使用多尺度训练以实现可变输入尺寸以及速度-精度权衡。
用相对于网格单元的直接边界框坐标替代位置预测，以提高稳定性。
添加透传层以融合早期层的细粒度特征。
以 Darknet-19 作为基础网络并进行三先验设置的检测训练。
提出 WordTree 分层分类，将 ImageNet 与 COCO 的标签合并，并实现检测与分类的联合训练。
通过将 COCO 检测数据和 ImageNet 分类数据混合来训练 YOLO9000，并通过层级传递标签。

实验结果

研究问题

RQ1是否可以用检测数据与分类数据的结合来训练一个单一的实时检测器，以识别数千个对象类别？
RQ2分层标签（WordTree）是否在不产生互斥问题的情况下改善对多个数据集和类别的整合？
RQ3哪些架构和训练策略能在检测和大词汇量分类方面实现业界领先的速度-准确性权衡？
RQ4使用弱标注分类数据训练的检测器在未见类别的检测任务上有多好的泛化能力？
RQ5多尺度训练和特征融合对小目标定位和整体 mAP 的影响如何？

主要发现

YOLOv2 在 VOC 2007 上实现了最优的速度-精度，67 FPS 时的 mAP 为 76.8（在 40 FPS 时为 78.6 mAP）。
YOLOv2 在速度-精度权衡方面优于带 ResNet 的 Faster R-CNN 和 SSD 在 VOC 2007 上的表现。
YOLO9000 在 ImageNet 检测上达到 19.7 mAP，而仅对 200 个类别中的 44 个具有检测数据；在 COCO 未见的 156 个类别上达到 16.0 mAP。
通过 WordTree 在 COCO 与 ImageNet 上的联合训练，YOLO9000 能以实时方式检测超过 9000 个对象类别。
通过 k-means 学习的尺寸先验在召回和 IOU 对齐方面优于手工选取的先验。
联合训练在具有分层标签结构的情况下实现跨数据集的鲁棒泛化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。