QUICK REVIEW

[论文解读] The GAN is dead; long live the GAN! A Modern GAN Baseline

Yi‐Wen Huang, Aaron Gokaslan|arXiv (Cornell University)|Jan 9, 2025

Advanced Neural Network Applications被引用 3

一句话总结

论文提出了一种具有 0-GP 惩罚的行为良好的正则化 RpGAN 损失，能够实现极简主义、现代化骨干网络 GAN（R3GAN），在 FFHQ、ImageNet、CIFAR 和 Stacked MNIST 上取得强劲的 FID 分数且不需要经验技巧。

ABSTRACT

There is a widely-spread claim that GANs are difficult to train, and GAN architectures in the literature are littered with empirical tricks. We provide evidence against this claim and build a modern GAN baseline in a more principled manner. First, we derive a well-behaved regularized relativistic GAN loss that addresses issues of mode dropping and non-convergence that were previously tackled via a bag of ad-hoc tricks. We analyze our loss mathematically and prove that it admits local convergence guarantees, unlike most existing relativistic losses. Second, our new loss allows us to discard all ad-hoc tricks and replace outdated backbones used in common GANs with modern architectures. Using StyleGAN2 as an example, we present a roadmap of simplification and modernization that results in a new minimalist baseline -- R3GAN. Despite being simple, our approach surpasses StyleGAN2 on FFHQ, ImageNet, CIFAR, and Stacked MNIST datasets, and compares favorably against state-of-the-art GANs and diffusion models.

研究动机与目标

Argue that GANs can be trained stably with a principled loss rather than ad-hoc tricks.
Develop a well-behaved loss by regularizing RpGAN with 0-centered gradient penalties.
Upgrade backbones to modern ConvNet/Transformer-inspired architectures while removing StyleGAN tricks.
Demonstrate that the minimalist R3GAN baseline achieves superior or competitive FID across multiple datasets.

提出的方法

Formulate RpGAN and augment it with zero-centered gradient penalties R1 and R2 to ensure local convergence.
Prove that RpGAN with R1/R2 has locally convergent training under reasonable assumptions.
Replace outdated backbones with modern ResNet/ConvNeXt-inspired architectures while stripping non-essential StyleGAN components.
Systematically evaluate configurations from a StyleGAN2 baseline to a modernized R3GAN on FFHQ-256, CIFAR-10, and ImageNet tasks.
Conduct experiments on StackedMNIST to measure mode recovery and KL divergence between p_theta and p_D.

实验结果

研究问题

RQ1Can a regularized RpGAN loss with 0-GP provide stable convergence and good sample diversity without empirical tricks?
RQ2How far can we simplify GAN backbones while preserving or improving FID across standard benchmarks?
RQ3What is the impact of modern backbone redesign (ConvNeXt/ResNet-inspired) on GAN performance when paired with RpGAN+R1+R2?
RQ4How does the simplified baseline R3GAN perform in terms of mode coverage and recall on challenging datasets like StackedMNIST?
RQ5How does R3GAN compare to diffusion models in terms of FID, NFE, and sample quality on FFHQ and ImageNet?

主要发现

Configuration	FID FFHQ-256
A (StyleGAN2)	7.516
B (Stripped StyleGAN2)	12.46
C (Well-behaved Loss)	11.65
D (ConvNeXt-ify pt. 1)	9.95
E (ConvNeXt-ify pt. 2)	7.045

RpGAN with both R1 and R2 yields stable training, beating divergent behavior observed with RpGAN alone or with only R1.
The well-behaved loss enables a modern backbone so the model surpasses StyleGAN2 on FFHQ-256 and outperforms several SOTA GANs and some diffusion models on multiple datasets.
A modernized ResNet/ConvNeXt-style backbone with careful initialization and resampling improves FID compared to the StyleGAN2 baseline (FFHQ-256: 9.95 to 7.05 with final E configuration).
On StackedMNIST, the Config E model achieves full 1000-mode recovery and low D_KL, surpassing many prior GANs.
On CIFAR-10 and ImageNet variants, Config E achieves competitive or superior FID with substantially fewer parameters than many diffusion models, while maintaining single-step generation.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。