QUICK REVIEW

[论文解读] Rethinking ImageNet Pre-training

Kaiming He, Ross Girshick|arXiv (Cornell University)|Nov 21, 2018

Advanced Neural Network Applications参考文献 39被引用 99

一句话总结

本论文证明在COCO上进行目标检测和实例分割时，如果从头训练，在归一化、较长训练时间以及合适的超参数调整下，可以达到或超过ImageNet预训练模型的性能；ImageNet预训练主要加速早期收敛，且并非最终准确度的必需条件。

ABSTRACT

We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization. The results are no worse than their ImageNet pre-training counterparts even when using the hyper-parameters of the baseline system (Mask R-CNN) that were optimized for fine-tuning pre-trained models, with the sole exception of increasing the number of training iterations so the randomly initialized models may converge. Training from random initialization is surprisingly robust; our results hold even when: (i) using only 10% of the training data, (ii) for deeper and wider models, and (iii) for multiple tasks and metrics. Experiments show that ImageNet pre-training speeds up convergence early in training, but does not necessarily provide regularization or improve final target task accuracy. To push the envelope we demonstrate 50.9 AP on COCO object detection without using any external data---a result on par with the top COCO 2017 competition results that used ImageNet pre-training. These observations challenge the conventional wisdom of ImageNet pre-training for dependent tasks and we expect these discoveries will encourage people to rethink the current de facto paradigm of `pre-training and fine-tuning' in computer vision.

研究动机与目标

质疑ImageNet预训练在COCO上进行目标检测和分割的必要性。
评估在标准基线和超参数下，是否从零开始训练可以达到可比或更优的最终性能。
确定在不同体系结构和数据情景下，使从零开始训练可行所需的归一化和训练时长调整。
评估数据规模（完整COCO vs. 减少数据）如何影响预训练的相对收益。

提出的方法

在COCO train2017上使用Mask R-CNN，骨干网络为ResNet/ResNeXt并配备FPN，评估val2017上的bbox和mask AP。
将冻结的BatchNorm替换为GroupNorm或SyncBN，以实现稳定的scratch训练。
增加训练迭代次数（6×计划）以使scratch模型收敛。
在训练时使用尺度增强和数据增强，以研究不同数据情景下的鲁棒性。
在不同架构、数据规模和任务特定指标（bbox AP、mask AP、keypoint AP）上，比较scratch与ImageNet预训练。
展示大规模scratch训练（X152与GN）在无预训练的情况下也能达到高AP。

实验结果

研究问题

RQ1在COCO上从零开始训练时，目标检测和实例分割是否能达到与ImageNet预训练模型的同等水平？
RQ2需要哪些归一化技术来实现检测器的稳定scratch训练？
RQ3训练时长如何影响scratch收敛和最终准确度，与预训练相比？
RQ4ImageNet预训练是否提供正则化效应，还是主要在数据有限时加速早期收敛？
RQ5scratch训练模型在定位敏感度指标和关键点检测上的表现如何？

主要发现

Model	Schedule	AP_bbox (val2017)
R50 (random init)	2×	36.8
R50 (random init)	3×	39.5
R50 (random init)	4×	40.6
R50 (random init)	5×	40.7
R50 (random init)	6×	41.3
R50 (with pre-train)	2×	40.3
R50 (with pre-train)	3×	40.8
R50 (with pre-train)	4×	40.9
R50 (with pre-train)	5×	40.9
R50 (with pre-train)	6×	41.1
R101 (random init)	2×	38.2
R101 (random init)	3×	41.0
R101 (random init)	4×	41.8
R101 (random init)	5×	42.2
R101 (random init)	6×	42.7
R101 (with pre-train)	2×	41.8
R101 (with pre-train)	3×	42.3
R101 (with pre-train)	4×	42.3
R101 (with pre-train)	5×	41.9
R101 (with pre-train)	6×	42.2

在使用GN/SyncBN和扩展训练计划时，COCO上的scratch训练在多个基线下可以达到甚至超过ImageNet预训练模型的准确度。
ImageNet预训练可以加速早期收敛，但在标准计划下不一定提高最终目标任务的准确度；使用更长的训练（5×–6×）时，scratch模型达到可比或更好的AP。
即使只用约10%的COCO数据，scratch训练仍具有竞争力；并且从零开始训练的更大骨干网络（如X152）在val2017上可达到约50.9 bbox AP和约43.2 mask AP。
定位敏感指标和关键点检测通常对ImageNet预训练收益较小或无收益；在高重叠阈值和关键点任务中，scratch模型的表现可同样好或更好。
在不同数据情景下，适当的归一化和更长的优化时间至关重要；当数据充足或任务强调定位而非分类时，预训练的帮助就更小。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。