Skip to main content
QUICK REVIEW

[论文解读] Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels

Zebin You, Yong Zhong|arXiv (Cornell University)|Feb 21, 2023
Generative Adversarial Networks and Image Synthesis被引用 9
一句话总结

本文提出 Dual Pseudo Training (DPT),一个三阶段策略:半监督分类器生成伪标签以训练条件扩散模型,扩散模型反过来提供伪图像来增强分类器,从而在极少标注的情况下实现半监督生成与分类的最先进结果。

ABSTRACT

In an effort to further advance semi-supervised generative and classification tasks, we propose a simple yet effective training strategy called dual pseudo training (DPT), built upon strong semi-supervised learners and diffusion models. DPT operates in three stages: training a classifier on partially labeled data to predict pseudo-labels; training a conditional generative model using these pseudo-labels to generate pseudo images; and retraining the classifier with a mix of real and pseudo images. Empirically, DPT consistently achieves SOTA performance of semi-supervised generation and classification across various settings. In particular, with one or two labels per class, DPT achieves a Fréchet Inception Distance (FID) score of 3.08 or 2.52 on ImageNet 256x256. Besides, DPT outperforms competitive semi-supervised baselines substantially on ImageNet classification tasks, achieving top-1 accuracies of 59.0 (+2.8), 69.5 (+3.0), and 74.4 (+2.0) with one, two, or five labels per class, respectively. Notably, our results demonstrate that diffusion can generate realistic images with only a few labels (e.g., <0.1%) and generative augmentation remains viable for semi-supervised classification. Our code is available at https://github.com/ML-GSAI/DPT.

研究动机与目标

  • Motivate: improve semi-supervised generation and classification when labeled data are scarce.
  • Propose: a three-stage training pipeline (DPT) combining a semi-supervised classifier and a conditional diffusion model.
  • Demonstrate: DPT achieves state-of-the-art FID/IS for generation and top-1 accuracy for classification under ultra-low labeling.
  • Showcase: that diffusion can generate realistic images with <0.1% labels and that generative augmentation benefits classifiers.

提出的方法

  • Stage 1: Train a semi-supervised classifier on labeled and unlabeled data and predict pseudo-labels for all data.
  • Stage 2: Train a conditional diffusion model on real data with pseudo-labels to generate pseudo images for each class using classifier-generated labels.
  • Stage 3: Train the classifier on real data augmented with pseudo images labeled by the diffusion model, effectively closing the loop.
  • Utilize Classifier-Free Guidance (CFG) in diffusion with tuned guidance strength to control semantics.
  • Adopt a U-ViT-based diffusion backbone and semi-supervised learners (MSN or Semi-ViT) as the classifier.
  • Evaluate with FID, FID_CLIP, sFID, IS, precision/recall, and ImageNet/CIFAR-10 benchmarks across resolutions.

实验结果

研究问题

  • RQ1Can diffusion models generate high-fidelity, semantically controllable images with extremely few labels (e.g., <0.1%)?
  • RQ2Can generative augmentation from diffusion models improve semi-supervised classification performance when labels are scarce?
  • RQ3Do diffusion models and strong semi-supervised learners benefit each other in a mutually reinforcing training loop?
  • RQ4Is the proposed three-stage DPT pipeline robust across resolutions and label regimes (1, 2, 5 labels per class, 1% labels)?

主要发现

Method (Model)Label fractionFID-50KFID_CLIPsFIDISPrecisionRecall# Params
DPT (ours, with MSN)<0.1% (1)3.081.845.56201.680.800.58585M
DPT (ours, with MSN)<0.2% (2)2.521.815.49230.340.810.57585M
DPT (ours, with MSN)<0.4% (5)2.501.825.54243.100.830.55585M
DPT (ours, with U-ViT-Huge)<0.1% (1)3.081.845.56201.680.800.58585M
  • DPT achieves state-of-the-art semi-supervised generation on CIFAR-10 and ImageNet across resolutions (128x128, 256x256, 512x512).
  • With <0.1% labels on ImageNet-256x256, DPT attains an FID of 3.08, outperforming several supervised diffusion models.
  • With 1-5 labels per class on ImageNet-256x256, DPT attains top-1 accuracies of 59.0, 69.5, and 74.4 respectively, improving strong baselines.
  • DPT with 1% labels achieves an FID of 2.42 on 512x512 generation, and 1% label performance approaches fully supervised baselines on several metrics.
  • DPT demonstrates that diffusion-based generative augmentation remains viable for semi-supervised classification, achieving SOTA results on ImageNet with few labels (e.g., 59.0/69.5/74.4).
  • Qualitative results show realistic, diverse, and semantically correct images even with very few labels.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。