QUICK REVIEW

[论文解读] Learning Disentangled Joint Continuous and Discrete Representations

Emilien Dupont|arXiv (Cornell University)|Mar 31, 2018

Generative Adversarial Networks and Image Synthesis参考文献 21被引用 61

一句话总结

JointVAE 在无监督的变分框架中学习解耦的连续与离散潜在因子，当离散因子突出时，其对仅连续解耦的优势明显。

ABSTRACT

We present a framework for learning disentangled and interpretable jointly continuous and discrete representations in an unsupervised manner. By augmenting the continuous latent distribution of variational autoencoders with a relaxed discrete distribution and controlling the amount of information encoded in each latent unit, we show how continuous and categorical factors of variation can be discovered automatically from data. Experiments show that the framework disentangles continuous and discrete generative factors on various datasets and outperforms current disentangling methods when a discrete generative factor is prominent.

研究动机与目标

激励并解决在数据中同时对连续和离散生成因子进行解耦的需求。
提出一个变分自编码器框架，联合建模连续与离散潜在变量。
在多样数据集上实现离散因素与连续因素的无监督发现。

提出的方法

引入一个联合潜在分布 q(z, c|x)，其中 z 为连续，c 为离散。
将 beta-VAE 目标扩展为对 z 和 c 各自的 KL 项可分离，容量为 Cz 和 Cc。
使用 Gumbel-Softmax（Concrete）对离散变量进行放松，以实现可微采样。
将潜在容量 Cz 和 Cc 拆分并逐步增加，以促使两个潜在通道的学习。
用高斯分布 q(z|x) 参数化 z，用 Gumbel-Softmax q(c|x) 参数化 c，并将其连接后用于解码。
提供与基于 CNN 的图像数据兼容的编码器/解码器架构，并对两种潜在类型使用重参数化技巧。

实验结果

研究问题

RQ1基于 VAE 的框架是否能够在无监督的方式下学习解耦的连续与离散因子？
RQ2应如何在连续与离散潜在通道之间分配并增加信息容量，以避免坍缩为单一类型？
RQ3在无监督情况下，JointVAE 对混合因子数据集（MNIST、FashionMNIST、CelebA、Chairs）解耦的经验潜力如何？

主要发现

模型	分数
β-VAE	0.73
FactorVAE	0.82
JointVAE	0.69

JointVAE 在 MNIST 数据集上解耦离散的数字类型以及角度、厚度和宽度等连续因素。
在 FashionMNIST 上，JointVAE 发现了可解释的因子，如袖长和颜色，尽管某些类别仍然相互纠缠。
在 CelebA 上，模型发现了方位角、年龄和背景颜色等因子，同时保持现实样本。
在 Chairs 上，JointVAE 识别了旋转和风格相关的离散因子，以及连续变化。
在 dSprites 的定量评估显示具有竞争力的解耦分数，JointVAE 捕获 4 个连续因子和 1 个离散因子。
推断网络可以在无监督的情况下推断属性（如方位角），并通过潜在变量操作实现图像编辑。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。