QUICK REVIEW

[Paper Review] Learning Disentangled Joint Continuous and Discrete Representations

Emilien Dupont|arXiv (Cornell University)|Mar 31, 2018

Generative Adversarial Networks and Image Synthesis21 references61 citations

TL;DR

JointVAE learns disentangled continuous and discrete latent factors in an unsupervised variational framework, outperforming continuous-only disentanglement when discrete factors are prominent.

ABSTRACT

We present a framework for learning disentangled and interpretable jointly continuous and discrete representations in an unsupervised manner. By augmenting the continuous latent distribution of variational autoencoders with a relaxed discrete distribution and controlling the amount of information encoded in each latent unit, we show how continuous and categorical factors of variation can be discovered automatically from data. Experiments show that the framework disentangles continuous and discrete generative factors on various datasets and outperforms current disentangling methods when a discrete generative factor is prominent.

Motivation & Objective

Motivate and address the need for disentangling both continuous and discrete generative factors in data.
Propose a variational autoencoder framework that jointly models continuous and discrete latents.
Enable unsupervised discovery of discrete factors alongside continuous factors across diverse datasets.

Proposed method

Introduce a joint latent distribution q(z, c|x) with continuous z and discrete c.
Extend the beta-VAE objective to include separable KL terms for z and c with capacities Cz and Cc.
Relax discrete variables using Gumbel-Softmax (Concrete) for differentiable sampling.
Split and gradually increase latent capacities Cz and Cc to encourage learning in both latent channels.
Parametrize z with Gaussian q(z|x) and c with Gumbel-Softmax q(c|x) and concatenate for decoding.
Provide an encoder/decoder architecture compatible with CNN-based image data and use reparameterization tricks for both latent types.

Experimental results

Research questions

RQ1Can a VAE-based framework learn disentangled continuous and discrete factors in an unsupervised manner?
RQ2How should the information capacity be allocated and increased between continuous and discrete latent channels to avoid collapse into a single type?
RQ3What is the empirical potential of JointVAE to disentangle mixed-factor datasets (MNIST, FashionMNIST, CelebA, Chairs) without supervision?

Key findings

JointVAE disentangles discrete digit type and continuous factors like angle, thickness, and width on MNIST.
On FashionMNIST, JointVAE discovers interpretable factors such as sleeve length and color, despite some classes remaining entangled.
On CelebA, the model discovers factors like azimuth, age, and background color, while preserving realistic samples.
On Chairs, JointVAE identifies rotation and style-related discrete factors along with continuous variations.
Quantitative evaluation on dSprites shows competitive disentanglement scores, with JointVAE capturing 4 continuous factors and 1 discrete factor.
The inference network can unsupervisedly infer properties (e.g., azimuth) and enable image editing through latent manipulations.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.