[논문 리뷰] Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels
논문은 Dual Pseudo Training (DPT) 를 소개합니다. 세 단계 전략에서 반지도 학습 분류기가 의사 레이블을 생성해 조건부 확산 모델을 학습시키고, 그 모델이 분류기를 보강하기 위한 의사 이미지를 제공하며, 극히 적은 라벨로도 반지도 생성 및 분류에서 최첨단 성능을 달성합니다.
In an effort to further advance semi-supervised generative and classification tasks, we propose a simple yet effective training strategy called dual pseudo training (DPT), built upon strong semi-supervised learners and diffusion models. DPT operates in three stages: training a classifier on partially labeled data to predict pseudo-labels; training a conditional generative model using these pseudo-labels to generate pseudo images; and retraining the classifier with a mix of real and pseudo images. Empirically, DPT consistently achieves SOTA performance of semi-supervised generation and classification across various settings. In particular, with one or two labels per class, DPT achieves a Fréchet Inception Distance (FID) score of 3.08 or 2.52 on ImageNet 256x256. Besides, DPT outperforms competitive semi-supervised baselines substantially on ImageNet classification tasks, achieving top-1 accuracies of 59.0 (+2.8), 69.5 (+3.0), and 74.4 (+2.0) with one, two, or five labels per class, respectively. Notably, our results demonstrate that diffusion can generate realistic images with only a few labels (e.g., <0.1%) and generative augmentation remains viable for semi-supervised classification. Our code is available at https://github.com/ML-GSAI/DPT.
연구 동기 및 목표
- Label: 데이터가 부족할 때 반지도 생성 및 분류를 개선하고자 함.
- Propose: 반지도 분류기와 조건부 확산 모델을 결합한 세 단계 학습 파이프라인(DPT)을 제안함.
- Demonstrate: ultra-low labeling 하에서 생성의 FID/IS 및 분류의 top-1 정확도에서 DPT의 최첨단 성능을 입증함.
- Showcase: 확산이 0.1% 미만의 라벨로도 실질적으로 현실적인 이미지를 생성하고 생성 보강이 분류기에 이익을 준다는 점을 보여줌
제안 방법
- Stage 1: Train a semi-supervised classifier on labeled and unlabeled data and predict pseudo-labels for all data.
- Stage 2: Train a conditional diffusion model on real data with pseudo-labels to generate pseudo images for each class using classifier-generated labels.
- Stage 3: Train the classifier on real data augmented with pseudo images labeled by the diffusion model, effectively closing the loop.
- Utilize Classifier-Free Guidance (CFG) in diffusion with tuned guidance strength to control semantics.
- Adopt a U-ViT-based diffusion backbone and semi-supervised learners (MSN or Semi-ViT) as the classifier.
- Evaluate with FID, FID_CLIP, sFID, IS, precision/recall, and ImageNet/CIFAR-10 benchmarks across resolutions.]
- research_questions:[
- Can diffusion models generate high-fidelity, semantically controllable images with extremely few labels (e.g., <0.1%)?
실험 결과
연구 질문
- RQ1Can generative augmentation from diffusion models improve semi-supervised classification performance when labels are scarce?
- RQ2Do diffusion models and strong semi-supervised learners benefit each other in a mutually reinforcing training loop?
- RQ3Is the proposed three-stage DPT pipeline robust across resolutions and label regimes (1, 2, 5 labels per class, 1% labels)?
주요 결과
| Method (Model) | Label fraction | FID-50K | FID_CLIP | sFID | IS | Precision | Recall | # Params |
|---|---|---|---|---|---|---|---|---|
| DPT (ours, with MSN) | <0.1% (1) | 3.08 | 1.84 | 5.56 | 201.68 | 0.80 | 0.58 | 585M |
| DPT (ours, with MSN) | <0.2% (2) | 2.52 | 1.81 | 5.49 | 230.34 | 0.81 | 0.57 | 585M |
| DPT (ours, with MSN) | <0.4% (5) | 2.50 | 1.82 | 5.54 | 243.10 | 0.83 | 0.55 | 585M |
| DPT (ours, with U-ViT-Huge) | <0.1% (1) | 3.08 | 1.84 | 5.56 | 201.68 | 0.80 | 0.58 | 585M |
- DPT achieves state-of-the-art semi-supervised generation on CIFAR-10 and ImageNet across resolutions (128x128, 256x256, 512x512).
- With <0.1% labels on ImageNet-256x256, DPT attains an FID of 3.08, outperforming several supervised diffusion models.
- With 1-5 labels per class on ImageNet-256x256, DPT attains top-1 accuracies of 59.0, 69.5, and 74.4 respectively, improving strong baselines.
- DPT with 1% labels achieves an FID of 2.42 on 512x512 generation, and 1% label performance approaches fully supervised baselines on several metrics.
- DPT demonstrates that diffusion-based generative augmentation remains viable for semi-supervised classification, achieving SOTA results on ImageNet with few labels (e.g., 59.0/69.5/74.4).
- Qualitative results show realistic, diverse, and semantically correct images even with very few labels.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.