Skip to main content
QUICK REVIEW

[论文解读] The GAN is dead; long live the GAN! A Modern GAN Baseline

Yi‐Wen Huang, Aaron Gokaslan|arXiv (Cornell University)|Jan 9, 2025
Advanced Neural Network Applications被引用 3
一句话总结

论文提出了一种具有 0-GP 惩罚的行为良好的正则化 RpGAN 损失,能够实现极简主义、现代化骨干网络 GAN(R3GAN),在 FFHQ、ImageNet、CIFAR 和 Stacked MNIST 上取得强劲的 FID 分数且不需要经验技巧。

ABSTRACT

There is a widely-spread claim that GANs are difficult to train, and GAN architectures in the literature are littered with empirical tricks. We provide evidence against this claim and build a modern GAN baseline in a more principled manner. First, we derive a well-behaved regularized relativistic GAN loss that addresses issues of mode dropping and non-convergence that were previously tackled via a bag of ad-hoc tricks. We analyze our loss mathematically and prove that it admits local convergence guarantees, unlike most existing relativistic losses. Second, our new loss allows us to discard all ad-hoc tricks and replace outdated backbones used in common GANs with modern architectures. Using StyleGAN2 as an example, we present a roadmap of simplification and modernization that results in a new minimalist baseline -- R3GAN. Despite being simple, our approach surpasses StyleGAN2 on FFHQ, ImageNet, CIFAR, and Stacked MNIST datasets, and compares favorably against state-of-the-art GANs and diffusion models.

研究动机与目标

  • Argue that GANs can be trained stably with a principled loss rather than ad-hoc tricks.
  • Develop a well-behaved loss by regularizing RpGAN with 0-centered gradient penalties.
  • Upgrade backbones to modern ConvNet/Transformer-inspired architectures while removing StyleGAN tricks.
  • Demonstrate that the minimalist R3GAN baseline achieves superior or competitive FID across multiple datasets.

提出的方法

  • Formulate RpGAN and augment it with zero-centered gradient penalties R1 and R2 to ensure local convergence.
  • Prove that RpGAN with R1/R2 has locally convergent training under reasonable assumptions.
  • Replace outdated backbones with modern ResNet/ConvNeXt-inspired architectures while stripping non-essential StyleGAN components.
  • Systematically evaluate configurations from a StyleGAN2 baseline to a modernized R3GAN on FFHQ-256, CIFAR-10, and ImageNet tasks.
  • Conduct experiments on StackedMNIST to measure mode recovery and KL divergence between p_theta and p_D.

实验结果

研究问题

  • RQ1Can a regularized RpGAN loss with 0-GP provide stable convergence and good sample diversity without empirical tricks?
  • RQ2How far can we simplify GAN backbones while preserving or improving FID across standard benchmarks?
  • RQ3What is the impact of modern backbone redesign (ConvNeXt/ResNet-inspired) on GAN performance when paired with RpGAN+R1+R2?
  • RQ4How does the simplified baseline R3GAN perform in terms of mode coverage and recall on challenging datasets like StackedMNIST?
  • RQ5How does R3GAN compare to diffusion models in terms of FID, NFE, and sample quality on FFHQ and ImageNet?

主要发现

ConfigurationFID FFHQ-256
A (StyleGAN2)7.516
B (Stripped StyleGAN2)12.46
C (Well-behaved Loss)11.65
D (ConvNeXt-ify pt. 1)9.95
E (ConvNeXt-ify pt. 2)7.045
  • RpGAN with both R1 and R2 yields stable training, beating divergent behavior observed with RpGAN alone or with only R1.
  • The well-behaved loss enables a modern backbone so the model surpasses StyleGAN2 on FFHQ-256 and outperforms several SOTA GANs and some diffusion models on multiple datasets.
  • A modernized ResNet/ConvNeXt-style backbone with careful initialization and resampling improves FID compared to the StyleGAN2 baseline (FFHQ-256: 9.95 to 7.05 with final E configuration).
  • On StackedMNIST, the Config E model achieves full 1000-mode recovery and low D_KL, surpassing many prior GANs.
  • On CIFAR-10 and ImageNet variants, Config E achieves competitive or superior FID with substantially fewer parameters than many diffusion models, while maintaining single-step generation.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。