[论文解读] The GAN is dead; long live the GAN! A Modern GAN Baseline
论文提出了一种具有 0-GP 惩罚的行为良好的正则化 RpGAN 损失,能够实现极简主义、现代化骨干网络 GAN(R3GAN),在 FFHQ、ImageNet、CIFAR 和 Stacked MNIST 上取得强劲的 FID 分数且不需要经验技巧。
There is a widely-spread claim that GANs are difficult to train, and GAN architectures in the literature are littered with empirical tricks. We provide evidence against this claim and build a modern GAN baseline in a more principled manner. First, we derive a well-behaved regularized relativistic GAN loss that addresses issues of mode dropping and non-convergence that were previously tackled via a bag of ad-hoc tricks. We analyze our loss mathematically and prove that it admits local convergence guarantees, unlike most existing relativistic losses. Second, our new loss allows us to discard all ad-hoc tricks and replace outdated backbones used in common GANs with modern architectures. Using StyleGAN2 as an example, we present a roadmap of simplification and modernization that results in a new minimalist baseline -- R3GAN. Despite being simple, our approach surpasses StyleGAN2 on FFHQ, ImageNet, CIFAR, and Stacked MNIST datasets, and compares favorably against state-of-the-art GANs and diffusion models.
研究动机与目标
- Argue that GANs can be trained stably with a principled loss rather than ad-hoc tricks.
- Develop a well-behaved loss by regularizing RpGAN with 0-centered gradient penalties.
- Upgrade backbones to modern ConvNet/Transformer-inspired architectures while removing StyleGAN tricks.
- Demonstrate that the minimalist R3GAN baseline achieves superior or competitive FID across multiple datasets.
提出的方法
- Formulate RpGAN and augment it with zero-centered gradient penalties R1 and R2 to ensure local convergence.
- Prove that RpGAN with R1/R2 has locally convergent training under reasonable assumptions.
- Replace outdated backbones with modern ResNet/ConvNeXt-inspired architectures while stripping non-essential StyleGAN components.
- Systematically evaluate configurations from a StyleGAN2 baseline to a modernized R3GAN on FFHQ-256, CIFAR-10, and ImageNet tasks.
- Conduct experiments on StackedMNIST to measure mode recovery and KL divergence between p_theta and p_D.
实验结果
研究问题
- RQ1Can a regularized RpGAN loss with 0-GP provide stable convergence and good sample diversity without empirical tricks?
- RQ2How far can we simplify GAN backbones while preserving or improving FID across standard benchmarks?
- RQ3What is the impact of modern backbone redesign (ConvNeXt/ResNet-inspired) on GAN performance when paired with RpGAN+R1+R2?
- RQ4How does the simplified baseline R3GAN perform in terms of mode coverage and recall on challenging datasets like StackedMNIST?
- RQ5How does R3GAN compare to diffusion models in terms of FID, NFE, and sample quality on FFHQ and ImageNet?
主要发现
| Configuration | FID FFHQ-256 |
|---|---|
| A (StyleGAN2) | 7.516 |
| B (Stripped StyleGAN2) | 12.46 |
| C (Well-behaved Loss) | 11.65 |
| D (ConvNeXt-ify pt. 1) | 9.95 |
| E (ConvNeXt-ify pt. 2) | 7.045 |
- RpGAN with both R1 and R2 yields stable training, beating divergent behavior observed with RpGAN alone or with only R1.
- The well-behaved loss enables a modern backbone so the model surpasses StyleGAN2 on FFHQ-256 and outperforms several SOTA GANs and some diffusion models on multiple datasets.
- A modernized ResNet/ConvNeXt-style backbone with careful initialization and resampling improves FID compared to the StyleGAN2 baseline (FFHQ-256: 9.95 to 7.05 with final E configuration).
- On StackedMNIST, the Config E model achieves full 1000-mode recovery and low D_KL, surpassing many prior GANs.
- On CIFAR-10 and ImageNet variants, Config E achieves competitive or superior FID with substantially fewer parameters than many diffusion models, while maintaining single-step generation.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。