QUICK REVIEW

[Paper Review] Isolating Sources of Disentanglement in Variational Autoencoders

Ricky T. Q. Chen, Xuechen Li|arXiv (Cornell University)|Feb 14, 2018

Generative Adversarial Networks and Image Synthesis46 references145 citations

TL;DR

The paper decomposes the ELBO to isolate a total-correlation term, introduces beta-TCVAE as a plug-in improvement over beta-VAE with no extra hyperparameters, and proposes MIG, a classifier-free disentanglement metric. It empirically links total correlation to disentanglement across datasets.

ABSTRACT

We decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables. We use this to motivate our $β$-TCVAE (Total Correlation Variational Autoencoder), a refinement of the state-of-the-art $β$-VAE objective for learning disentangled representations, requiring no additional hyperparameters during training. We further propose a principled classifier-free measure of disentanglement called the mutual information gap (MIG). We perform extensive quantitative and qualitative experiments, in both restricted and non-restricted settings, and show a strong relation between total correlation and disentanglement, when the latent variables model is trained using our framework.

Motivation & Objective

Motivate and quantify disentanglement in VAEs by decomposing the ELBO to identify the total correlation term.
Propose a training method that weights decomposition terms without introducing new hyperparameters.
Introduce beta-TCVAE as a plug-in replacement for beta-VAE with automatic disentanglement benefits.
Propose a classifier-free, information-theoretic metric (MIG) to evaluate disentanglement across latent distributions.

Proposed method

Derive an ELBO decomposition revealing index-code MI, total correlation, and dimension-wise KL terms.
Propose minibatch-weighted sampling to estimate decomposition terms without extra hyperparameters.
Define beta-TCVAE as a special case with alpha=gamma=1 and beta controlling TC penalty.
Provide an alternative training approach to estimate TC without a discriminator.

Experimental results

Research questions

RQ1Does penalizing the total correlation term in the ELBO promote disentanglement in VAEs?
RQ2Can beta-TCVAE achieve better disentanglement than beta-VAE without adding training complexity?
RQ3Is there a robust, classifier-free metric to quantify disentanglement across latent distributions?
RQ4How does total correlation correlate with disentanglement across datasets and sampling biases?

Key findings

beta-TCVAE yields more interpretable disentangled representations than beta-VAE in several datasets.
Total correlation correlates negatively with disentanglement under beta-TCVAE, supporting the TC penalty's role.
MIG provides a classifier-free, axis-aligned, generalizable disentanglement measure applicable to various latent distributions.
The proposed minibatch weighting allows training with TC weighting without additional hyperparameters.
FactorVAE, which is similar in objective, can be outperformed when density ratio tricks are hard to train, highlighting beta-TCVAE robustness.
beta-TCVAE remains effective even under non-uniform or dependent factor sampling, improving interpretability over baselines.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.