QUICK REVIEW

[论文解读] Self-training with Noisy Student improves ImageNet classification

Qizhe Xie, Minh-Thang Luong|arXiv (Cornell University)|Nov 11, 2019

Advanced Neural Network Applications参考文献 99被引用 240

一句话总结

Noisy Student Training 使用更大、带噪声的学生模型，在教师的伪标签下对未标注数据进行训练，显著提升 ImageNet 的准确性和鲁棒性，利用未标注数据。

ABSTRACT

We present Noisy Student Training, a semi-supervised learning approach that works well even when labeled data is abundant. Noisy Student Training achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces ImageNet-P mean flip rate from 27.8 to 12.2. Noisy Student Training extends the idea of self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. On ImageNet, we first train an EfficientNet model on labeled images and use it as a teacher to generate pseudo labels for 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the learning of the student, we inject noise such as dropout, stochastic depth, and data augmentation via RandAugment to the student so that the student generalizes better than the teacher. Models are available at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet. Code is available at https://github.com/google-research/noisystudent.

研究动机与目标

利用未标注图像提升 ImageNet 的准确率，超过仅使用标注数据所能提供的水平。
开发一种半监督框架，通过使用等大或更大的学生模型并引入噪声，超过以往的自训练和蒸馏方法。
展示在 ImageNet-A、ImageNet-C、ImageNet-P 上的鲁棒性提升，超越标准的 ImageNet 指标。

提出的方法

在标注数据上训练一个教师模型，以对未标注数据生成伪标签。
在标注数据与伪标签数据的组合上训练一个等大或更大的学生模型，并引入噪声（输入通过 RandAugment，模型通过 dropout 和随机深度）。
迭代地用最佳学生替换教师以生成新的伪标签并训练新的学生。
使用数据筛选和平衡，使未标注数据在每个类别上的分布与标注数据对齐。
比较软伪标签与硬伪标签，并消融噪声分量以展示它们的影响。

实验结果

研究问题

RQ1在强教师对未标注数据进行伪标签后，未标注数据是否能将 ImageNet 的准确率提升到超过现有最先进的监督训练？
RQ2注入噪声并使用至少与教师同等大小的学生是否能提升对伪标签的学习？
RQ3Noisy Student Training 如何影响在 ImageNet-A、ImageNet-C、ImageNet-P 的鲁棒性？
RQ4迭代训练对最终性能的影响是什么？
RQ5在本框架中软伪标签与硬伪标签相比有何差异？

主要发现

在 3 亿未标注图像的情况下，Noisy Student Training 在 ImageNet 上达到 88.4% 的 Top-1 准确率，超过使用更多未标注数据的先前方法。
鲁棒性：ImageNet-A 的 Top-1 准确度从 61.0% 提升到 83.7%；ImageNet-C 的均值污染误差从 45.7 降至 28.3；ImageNet-P 的均值翻转率从 27.8 降至 12.2。
EfficientNet-L2 结合 Noisy Student Training 在 ImageNet 上实现 88.4% Top-1 和 98.7% Top-5 准确率（表 2）。
迭代训练（教师 -> 学生 -> 新教师）在增加未标注批量比时，分别得到 87.6%、88.1% 和最终 88.4% 的 Top-1 准确率。
噪声至关重要：去除数据增强、随机深度或 dropout 会降低性能；大量未标注数据是有益的。
Noisy Student Training 在 FGSM/PGD 下提高对抗鲁棒性，尽管并非为对抗鲁棒性优化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。