QUICK REVIEW

[论文解读] Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations

Diane Bouchacourt, Ryota Tomioka|arXiv (Cornell University)|May 24, 2017

Generative Adversarial Networks and Image Synthesis被引用 137

一句话总结

ML-VAE 通过对组内共享的内容与每个观测的风格进行建模，在分组数据中学习解耦的表示，从而实现证据累积和对未见组的测试时泛化。

ABSTRACT

We would like to learn a representation of the data which decomposes an observation into factors of variation which we can independently control. Specifically, we want to use minimal supervision to learn a latent representation that reflects the semantics behind a specific grouping of the data, where within a group the samples share a common factor of variation. For example, consider a collection of face images grouped by identity. We wish to anchor the semantics of the grouping into a relevant and disentangled representation that we can easily exploit. However, existing deep probabilistic models often assume that the observations are independent and identically distributed. We present the Multi-Level Variational Autoencoder (ML-VAE), a new deep probabilistic model for learning a disentangled representation of a set of grouped observations. The ML-VAE separates the latent representation into semantically meaningful parts by working both at the group level and the observation level, while retaining efficient test-time inference. Quantitative and qualitative evaluations show that the ML-VAE model (i) learns a semantically meaningful disentanglement of grouped data, (ii) enables manipulation of the latent representation, and (iii) generalises to unseen groups.

研究动机与目标

通过弱组级监督在分组数据中锚定语义。
将潜在因子分离为组共享的内容与观测特定的风格。
在处理非-iid 的分组观测时保持摊销推理。

提出的方法

引入两层潜在结构，其中组 G 中的所有样本共享内容 C_G，组 G 的每个观测 i 的风格为 S_i。
将 q(C_G, S_G|X_G;φ) 定义为一个因子化的变分近似，其中包含 q(C_G|X_G;φ_c) 和 q(S_i|X_i;φ_s)。
使用按组求和的组级 ELBO：ELBO(G;θ,φ_s,φ_c) = sum_i in G E_{q(C_G|X_G)} E_{q(S_i|X_i)}[log p(X_i|C_G, S_i; θ)] - KL 项。
通过将 q(C_G|X_G) 设为来自各个编码的正态分布乘积（高斯乘积法则），来累积 C_G 的证据。
计算组ELBO，对组取平均，并在分组的小批量中最大化以学习 θ、φ_c、φ_s。
提供测试时推断，使其能够从每组的多个测试样本累积证据（策略 2）或单个样本（策略 1）。

实验结果

研究问题

RQ1组级监督能否将语义因子锚定到解耦的潜在空间中？
RQ2在组级建模内容、观测级建模风格是否比独立同分布的 VAE 能带来更好的解耦？
RQ3摊销推断是否能够在不牺牲测试时效率的情况下适应非iid 的分组观测？
RQ4跨组成员累积证据是否能提升潜在变量的精度和下游分类？
RQ5所学习的解耦表示在测试时能否泛化到未见组？

主要发现

ML-VAE 通过将内容（组共享）与风格（观测特定）分离，学习到具有语义意义的解耦。
通过正态分布乘积的方法累积证据，随着组大小的增长，减少内容不确定性。
该模型在测试时对未见组具有泛化能力，在包含未见身份的数据集上得到验证。
潜在内容 C 对类别标签有信息量，而风格 S 不具信息量，从而实现有效的下游分类。
对潜在空间的操作（置换、插值、生成）表明可控的解耦和流形覆盖。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。