Skip to main content
QUICK REVIEW

[论文解读] Learning Disentangled Representations with Semi-Supervised Deep Generative Models

N. Siddharth, Brooks Paige|arXiv (Cornell University)|Jun 1, 2017
Explainable Artificial Intelligence (XAI)被引用 140
一句话总结

一个半监督的深度生成模型,具有部分指定的图结构,通过将灵活的神经编码器/解码器与基于重要性采样的半监督目标相结合,学习可解耦的表示。它展示了对因素的解耦,例如数字身份与书写风格,脸部的身份与光照,以及在多位数字场景中的计数。

ABSTRACT

Variational autoencoders (VAEs) learn representations of data by jointly training a probabilistic encoder and decoder network. Typically these models encode all features of the data into a single variable. Here we are interested in learning disentangled representations that encode distinct aspects of the data into separate variables. We propose to learn such representations using model architectures that generalise from standard VAEs, employing a general graphical model structure in the encoder and decoder. This allows us to train partially-specified models that make relatively strong assumptions about a subset of interpretable variables and rely on the flexibility of neural networks to learn representations for the remaining variables. We further define a general objective for semi-supervised learning in this model class, which can be approximated using an importance sampling procedure. We evaluate our framework's ability to learn disentangled representations, both by qualitative exploration of its generative capacity, and quantitative evaluation of its discriminative ability on a variety of models and datasets.

研究动机与目标

  • Motivate learning disentangled representations that separate interpretable factors of variation.
  • Develop a framework to support partially specified graphical models within variational autoencoders.
  • Enable semi-supervised learning by leveraging partial supervision to guide latent factorization.
  • Provide a general objective and inference method that accommodates arbitrary dependency structures among latent variables.

提出的方法

  • Define a partially-specified graphical model where some latent variables are interpretable and optionally supervised, while others are learned through neural networks.
  • Generalize the VAE objective to accommodate arbitrary dependency structures between latent variables in both the generative model p_theta(x,y,z) and the recognition model q_phi(y,z|x).
  • Derive a semi-supervised objective using an importance-sampling estimator that handles arbitrary q_phi(y,z|x) and permits partial supervision on y.
  • Introduce a stochastic computation graph construction that supports end-to-end training with both supervised and unsupervised latent variables.
  • Demonstrate the approach with experiments on MNIST, SVHN, Yale B faces, and a multi-MNIST setting, including partial supervision scenarios.

实验结果

研究问题

  • RQ1Can partially-specified probabilistic graphical models be effectively integrated into variational autoencoders to yield disentangled representations?
  • RQ2How can semi-supervised learning be formulated and optimized when latent variables have arbitrary dependency structures beyond simple factorisations?
  • RQ3To what extent does partial supervision guide the learning of interpretable latent factors such as digit identity, handwriting style, identity, and lighting?
  • RQ4Can the framework handle models with stochastic dimensionality and compositional sub-models while preserving disentanglement and predictive performance?

主要发现

  • The framework can learn disentangled representations by tying interpretable latent variables to partially supervised factors and leaving others to neural-network based learning.
  • The proposed importance-sampling based estimator (and its log-sum-exp variant) enables semi-supervised training for models with general latent dependencies.
  • Experiments show competitive classification accuracy on MNIST and SVHN with limited labeled data, comparable to prior semi-supervised VAEs under similar settings.
  • On intrinsic-face data, the model disassociates identity from lighting and achieves both classification and regression tasks for the respective latent factors, even with partial supervision.
  • In multi-MNIST, the model can count digits and decompose images into constituent digits, demonstrating the ability to handle stochastic dimensionality and compositional structure.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。