QUICK REVIEW

[论文解读] Learning from Future: A Novel Self-Training Framework for Semantic Segmentation

Ye Du, Yujun Shen|arXiv (Cornell University)|Sep 15, 2022

Domain Adaptation and Few-Shot Learning被引用 21

一句话总结

本文提出未来自训练（FST）用于语义分割，其中教师由虚拟的未来学生状态构建，以提供更高质量的伪标签并指导当前学生，从而提升无监督领域自适应和半监督分割的性能。

ABSTRACT

Self-training has shown great potential in semi-supervised learning. Its core idea is to use the model learned on labeled data to generate pseudo-labels for unlabeled samples, and in turn teach itself. To obtain valid supervision, active attempts typically employ a momentum teacher for pseudo-label prediction yet observe the confirmation bias issue, where the incorrect predictions may provide wrong supervision signals and get accumulated in the training process. The primary cause of such a drawback is that the prevailing self-training framework acts as guiding the current state with previous knowledge, because the teacher is updated with the past student only. To alleviate this problem, we propose a novel self-training strategy, which allows the model to learn from the future. Concretely, at each training step, we first virtually optimize the student (i.e., caching the gradients without applying them to the model weights), then update the teacher with the virtual future student, and finally ask the teacher to produce pseudo-labels for the current student as the guidance. In this way, we manage to improve the quality of pseudo-labels and thus boost the performance. We also develop two variants of our future-self-training (FST) framework through peeping at the future both deeply (FST-D) and widely (FST-W). Taking the tasks of unsupervised domain adaptive semantic segmentation and semi-supervised semantic segmentation as the instances, we experimentally demonstrate the effectiveness and superiority of our approach under a wide range of settings. Code will be made publicly available.

研究动机与目标

通过减少伪标签中的确证偏差，激励改进语义分割的自训练。
开发一个框架，使模型能够向未来自我学习以提供更强的监督。
提出FST的两种变体：更深的未来（FST-D）和更广的未来（FST-W），以提升伪标签质量。
在UDA和SSL基准及多种体系结构上证明其有效性。

提出的方法

用虚拟未来步骤替换标准的mean-teacher EMA更新，在该步骤中梯度被缓存并用于形成未来教师。
Equation 3 updates the future teacher using a virtual update based on current gradients, enabling supervision from the future.
Equation 4 通过将当前权重和未来权重结合来丰富 EMA，形成更强大的教师（引入 mu' ）。
Equation 5 将 future-lookahead 扩展到更深的步骤，启用 K 步深度未来监督（FST-D）。
Equation 7 描述了更广的未来变体（FST-W），通过使用不同的数据批次进行未来探索来对多个前向方向进行集成。

实验结果

研究问题

RQ1用未来形成的教师引导当前学生是否能降低语义分割自训练中的确证偏差？
RQ2更深的（FST-D）和更广的（FST-W）未来探索在UDA和SSL设置下的性能与稳定性对比如何？
RQ3FST 的增益是否在 CNN 和 Transformer 骨干网络以及常见分割解码器上具有鲁棒性？

主要发现

FST-D 相对于标准 ST 显著提升，在所报告的设置中达到多达 3.5 个百分点的 mIoU 增益（例如在某设置下，ResNet-101 达到 59.8 mIoU）。
在相同条件下，FST-W 的增益小于 FST-D，但仍持续优于 ST。
FST 展现了跨体系结构（DeepLabV2、PSPNet、UPerNet 及多种骨干网络）以及骨干网络（ResNet、Swin、BEiT）的泛化能力，并取得显著增益。
在 SSL 和 UDA 基准中，FST 提供了领先或具有竞争力的改进，相对于强基线和现有方法（例如基于 DAFormer 的设置）。
更深的未来探索（K 约为 3）提供了稳定而强劲的性能提升，而非常大的 K 可能会在后期训练阶段造成损失。
使用不同的数据批次进行未来探索（FST-W）相比单批次探索，提升了鲁棒性和集成效果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。