QUICK REVIEW

[论文解读] Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Antti Tarvainen, Harri Valpola|arXiv (Cornell University)|Mar 6, 2017

Advanced Neural Network Applications被引用 1,592

一句话总结

Mean Teacher：通过对学生模型的权重使用指数移动平均（EMA）进行加权平均，形成一个教师模型，从而提供更高质量的一致性目标并在 SVHN、CIFAR-10 和 ImageNet 上改善半监督学习，同时在更少标签的情况下实现学习。

ABSTRACT

The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks. It maintains an exponential moving average of label predictions on each training example, and penalizes predictions that are inconsistent with this target. However, because the targets change only once per epoch, Temporal Ensembling becomes unwieldy when learning large datasets. To overcome this problem, we propose Mean Teacher, a method that averages model weights instead of label predictions. As an additional benefit, Mean Teacher improves test accuracy and enables training with fewer labels than Temporal Ensembling. Without changing the network architecture, Mean Teacher achieves an error rate of 4.35% on SVHN with 250 labels, outperforming Temporal Ensembling trained with 1000 labels. We also show that a good network architecture is crucial to performance. Combining Mean Teacher and Residual Networks, we improve the state of the art on CIFAR-10 with 4000 labels from 10.55% to 6.28%, and on ImageNet 2012 with 10% of the labels from 35.24% to 9.11%.

研究动机与目标

通过在未标注数据上强制预测一致性来激励半监督学习。
提出一个权重平均的教师（Mean Teacher）来在无需额外训练的情况下生成目标。
证明 EMA 加权目标在学习速度和准确性方面优于 Temporal Ensembling 和 Pi 模型。
展示对大型数据集和现代架构（ResNet/ImageNet）的可扩展性。
评估 Mean Teacher 的鲁棒性及关键超参数。

提出的方法

将一致性成本 J 定义为在噪声下学生和教师输出之间的期望平方距离。
替换目标生成：教师为学生权重的 EMA，使得在每次训练步后即可更新。
采用带有一致性权重线性升阶的组合损失进行训练：带标签的分类损失与一致性损失的结合。
在 SVHN 和 CIFAR-10 上使用不同标签数量、采用 13 层卷积网络，比较 Mean Teacher、Pi 模型和 Temporal Ensembling。
在 CIFAR-10 和 ImageNet 的残差网络（ResNet）上评估 Mean Teacher 以评估可扩展性。

实验结果

研究问题

RQ1相比目标预测集，权重平均模型参数（Mean Teacher）是否能提升半监督学习效果？
RQ2Mean Teacher 是否能够扩展到大数据集和在线学习，同时高效地使用未标注数据？
RQ3超参数（一致性权重、EMA 衰减）如何影响性能和训练动态？
RQ4分类目标与一致性目标的耦合是否会影响有效性？
RQ5网络架构对 Mean Teacher 性能有何影响？

主要发现

数据集	标签	模型	错误率(%)
SVHN	250 labels/73257 images	GAN	18.44±4.8
SVHN	250 labels/73257 images	Pi model	6.65±0.53
SVHN	250 labels/73257 images	Mean Teacher	4.35±0.50
SVHN	1000 labels/73257 images	Pi model	4.82±0.17
SVHN	1000 labels/73257 images	Mean Teacher	3.95±0.19
CIFAR-10	1000 labels/50000 images	Pi model	12.36±0.31
CIFAR-10	1000 labels/50000 images	Mean Teacher	21.55±1.48

Mean Teacher 在半监督设置下提升了 SVHN 和 CIFAR-10 上的测试准确率，优于 Pi 模型和 Temporal Ensembling。
在 SVHN 的 250 标签下，Mean Teacher 的错误率为 4.35%，而 Pi 模型为 6.65%，Temporal Ensembling 为 5.12%。
在 CIFAR-10 的 1000/2000/4000 标签下，Mean Teacher 分别达到 12.31%、15.73%、12.31%，在若干设置中优于 Pi 模型和 Temporal Ensembling；当有 4000 标签时，Mean Teacher 为 12.31% vs 13.20%（Pi）和 12.16%（Temporal Ensembling）。
使用 ResNet 架构的 Mean Teacher 获得强劲结果：CIFAR-10 4000 标签的 ResNet Mean Teacher 的错误率为 6.28%；ImageNet 10% 标签的验证错误为 9.11%，超过了先前的研究最优。
Mean Teacher 能扩展到大量未标注数据和在线学习，未标注数据在若干场景中比 Pi 模型更高效地提升性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。