QUICK REVIEW

[논문 리뷰] Learning from Future: A Novel Self-Training Framework for Semantic Segmentation

Ye Du, Yujun Shen|arXiv (Cornell University)|2022. 09. 15.

Domain Adaptation and Few-Shot Learning인용 수 21

한 줄 요약

The paper introduces future-self-training (FST) for semantic segmentation, where the teacher is built from virtual future student states to provide higher-quality pseudo-labels and guide the current student, improving both unsupervised domain adaptation and semi-supervised segmentation.

ABSTRACT

Self-training has shown great potential in semi-supervised learning. Its core idea is to use the model learned on labeled data to generate pseudo-labels for unlabeled samples, and in turn teach itself. To obtain valid supervision, active attempts typically employ a momentum teacher for pseudo-label prediction yet observe the confirmation bias issue, where the incorrect predictions may provide wrong supervision signals and get accumulated in the training process. The primary cause of such a drawback is that the prevailing self-training framework acts as guiding the current state with previous knowledge, because the teacher is updated with the past student only. To alleviate this problem, we propose a novel self-training strategy, which allows the model to learn from the future. Concretely, at each training step, we first virtually optimize the student (i.e., caching the gradients without applying them to the model weights), then update the teacher with the virtual future student, and finally ask the teacher to produce pseudo-labels for the current student as the guidance. In this way, we manage to improve the quality of pseudo-labels and thus boost the performance. We also develop two variants of our future-self-training (FST) framework through peeping at the future both deeply (FST-D) and widely (FST-W). Taking the tasks of unsupervised domain adaptive semantic segmentation and semi-supervised semantic segmentation as the instances, we experimentally demonstrate the effectiveness and superiority of our approach under a wide range of settings. Code will be made publicly available.

연구 동기 및 목표

Motivate improving self-training for semantic segmentation by reducing confirmation bias in pseudo-labels.
Develop a framework that lets the model learn from its future self to provide stronger supervision.
Propose two variants of FST: deeper future (FST-D) and wider future (FST-W) to enhance pseudo-label quality.
Demonstrate effectiveness across UDA and SSL benchmarks and multiple architectures.

제안 방법

Replace the standard mean-teacher EMA update with a virtual future step where gradients are cached and used to form a future teacher.
Equation 3 updates the future teacher using a virtual update based on current gradients, enabling supervision from the future.
Equation 4 enriches EMA with a combination of current and future weights to form a more capable teacher (introducing mu' ).
Equation 5 extends future-lookahead to deeper steps, enabling K-step deep future supervision (FST-D).
Equation 7 describes a wider-future variant (FST-W) that ensembles multiple forward directions by using different data batches for future exploration.

실험 결과

연구 질문

RQ1Does guiding the current student with a future-formed teacher reduce confirmation bias in self-training for semantic segmentation?
RQ2How do deeper (FST-D) and wider (FST-W) future explorations compare in terms of performance and stability across UDA and SSL settings?
RQ3Are the gains from FST robust across CNN and Transformer backbones and common segmentation decoders?

주요 결과

FST-D substantially improves over standard ST, achieving up to 3.5 percentage points mIoU gain in the reported settings (e.g., 59.8 mIoU with ResNet-101 on a certain setup).
FST-W yields smaller gains than FST-D under the same conditions, while still consistently outperforming ST.
FST shows generalization across architectures (DeepLabV2, PSPNet, UPerNet with various backbones) and backbones (ResNet, Swin, BEiT), with notable gains.
In SSL and UDA benchmarks, FST delivers state-of-the-art or competitive improvements over strong baselines and existing methods (e.g., DAFormer-based settings).
Deeper future exploration (K around 3) provides a stable and strong performance boost, while very large K can hurt later training stages.
Using different data batches for future exploration (FST-W) improves robustness and ensembling effects compared to single-batch exploration.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.