[Paper Review] Isolating Sources of Disentanglement in Variational Autoencoders
The paper decomposes the ELBO to isolate a total-correlation term, introduces beta-TCVAE as a plug-in improvement over beta-VAE with no extra hyperparameters, and proposes MIG, a classifier-free disentanglement metric. It empirically links total correlation to disentanglement across datasets.
We decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables. We use this to motivate our $β$-TCVAE (Total Correlation Variational Autoencoder), a refinement of the state-of-the-art $β$-VAE objective for learning disentangled representations, requiring no additional hyperparameters during training. We further propose a principled classifier-free measure of disentanglement called the mutual information gap (MIG). We perform extensive quantitative and qualitative experiments, in both restricted and non-restricted settings, and show a strong relation between total correlation and disentanglement, when the latent variables model is trained using our framework.
Motivation & Objective
- Motivate and quantify disentanglement in VAEs by decomposing the ELBO to identify the total correlation term.
- Propose a training method that weights decomposition terms without introducing new hyperparameters.
- Introduce beta-TCVAE as a plug-in replacement for beta-VAE with automatic disentanglement benefits.
- Propose a classifier-free, information-theoretic metric (MIG) to evaluate disentanglement across latent distributions.
Proposed method
- Derive an ELBO decomposition revealing index-code MI, total correlation, and dimension-wise KL terms.
- Propose minibatch-weighted sampling to estimate decomposition terms without extra hyperparameters.
- Define beta-TCVAE as a special case with alpha=gamma=1 and beta controlling TC penalty.
- Provide an alternative training approach to estimate TC without a discriminator.
Experimental results
Research questions
- RQ1Does penalizing the total correlation term in the ELBO promote disentanglement in VAEs?
- RQ2Can beta-TCVAE achieve better disentanglement than beta-VAE without adding training complexity?
- RQ3Is there a robust, classifier-free metric to quantify disentanglement across latent distributions?
- RQ4How does total correlation correlate with disentanglement across datasets and sampling biases?
Key findings
- beta-TCVAE yields more interpretable disentangled representations than beta-VAE in several datasets.
- Total correlation correlates negatively with disentanglement under beta-TCVAE, supporting the TC penalty's role.
- MIG provides a classifier-free, axis-aligned, generalizable disentanglement measure applicable to various latent distributions.
- The proposed minibatch weighting allows training with TC weighting without additional hyperparameters.
- FactorVAE, which is similar in objective, can be outperformed when density ratio tricks are hard to train, highlighting beta-TCVAE robustness.
- beta-TCVAE remains effective even under non-uniform or dependent factor sampling, improving interpretability over baselines.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.