[论文解读] InfoGAN-CR and ModelCentrality: Self-supervised Model Training and Selection for Disentangling GANs
本文提出 InfoGAN-CR,并结合 Contrastive Regularizer 实现 GAN 的自监督解耦;并提出 ModelCentrality 用于无监督模型选择,在无真实标签的情况下实现了最先进的解耦效果。
Disentangled generative models map a latent code vector to a target space, while enforcing that a subset of the learned latent codes are interpretable and associated with distinct properties of the target distribution. Recent advances have been dominated by Variational AutoEncoder (VAE)-based methods, while training disentangled generative adversarial networks (GANs) remains challenging. In this work, we show that the dominant challenges facing disentangled GANs can be mitigated through the use of self-supervision. We make two main contributions: first, we design a novel approach for training disentangled GANs with self-supervision. We propose contrastive regularizer, which is inspired by a natural notion of disentanglement: latent traversal. This achieves higher disentanglement scores than state-of-the-art VAE- and GAN-based approaches. Second, we propose an unsupervised model selection scheme called ModelCentrality, which uses generated synthetic samples to compute the medoid (multi-dimensional generalization of median) of a collection of models. The current common practice of hyper-parameter tuning requires using ground-truths samples, each labelled with known perfect disentangled latent codes. As real datasets are not equipped with such labels, we propose an unsupervised model selection scheme and show that it finds a model close to the best one, for both VAEs and GANs. Combining contrastive regularization with ModelCentrality, we improve upon the state-of-the-art disentanglement scores significantly, without accessing the supervised data.
研究动机与目标
- Address the challenges of disentangled GAN training and model selection without supervision.
- Introduce a self-supervised Regularizer to promote latent disentanglement via latent traversal.
- Propose ModelCentrality to select well-disentangled models without ground-truth labels.
- Demonstrate effectiveness on synthetic datasets (dSprites, 3DTeapots) and qualitative CelebA results.
- Show that the combined approach surpasses state-of-the-art supervised-tuning baselines.
提出的方法
- Introduce InfoGAN-CR by adding a Contrastive Regularizer (CR) to the InfoGAN framework.
- Add a CR discriminator H that performs multi-way hypothesis testing on paired generated images with fixed latent factors.
- Train with a composite objective: L_Adv - L_Info - L_c, governed by hyperparameters lambda and alpha.
- Define L_c to maximize a Jensen–Shannon divergence over latent-factor traversals, encouraging distinct latent-factor effects.
- Adopt progressive training to vary the coupling of latent traversals from easy to hard.
- Develop ModelCentrality as an unsupervised model selection method using a medoid-based score over a model similarity matrix built from cross-model FactorVAE evaluations on generated samples.
- Apply ModelCentrality to select models for both GANs and VAEs without ground-truth labels.
实验结果
研究问题
- RQ1Can self-supervision via a Contrastive Regularizer improve disentanglement in GANs beyond InfoGAN?
- RQ2Can an unsupervised model selection scheme (ModelCentrality) identify near-best disentangled models without ground-truth labels?
- RQ3How does ModelCentrality compare to existing unsupervised and supervised model selection methods (e.g., UDR Lasso, UDR Spearman)?
- RQ4Do the proposed methods generalize to both GANs and VAEs and perform well on standard disentanglement benchmarks?
主要发现
- InfoGAN-CR achieves higher disentanglement scores than state-of-the-art VAE- and GAN-based approaches on benchmark tasks.
- On the dSprites dataset, InfoGAN-CR attains FactorVAE scores around 0.88–0.90 and improves multiple metrics compared with baselines.
- On the 3DTeapots dataset, InfoGAN-CR models reach top performance across several disentanglement metrics.
- ModelCentrality selects central models without supervision and outperforms UDR Lasso and UDR Spearman in identifying strong disentangled models.
- Qualitative latent traversals on CelebA demonstrate coherent and interpretable factor changes.
- In some settings, the ModelCentrality-selected model closely matches or exceeds the best supervised-ground-truth model on key metrics.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。