[论文解读] Disentangling factors of variation in deep representations using adversarial training
本文提出一个条件变分自编码器,结合对抗训练,在深层表示中从未指定因素中解耦出指定的变化因素,实现弱监督下的近似无监督分离。它展示了单图像类比,并在多个数据集上实现对未见身份的泛化。
We introduce a conditional generative model for learning to disentangle the hidden factors of variation within a set of labeled observations, and separate them into complementary codes. One code summarizes the specified factors of variation associated with the labels. The other summarizes the remaining unspecified variability. During training, the only available source of supervision comes from our ability to distinguish among different observations belonging to the same class. Examples of such observations include images of a set of labeled objects captured at different viewpoints, or recordings of set of speakers dictating multiple phrases. In both instances, the intra-class diversity is the source of the unspecified factors of variation: each object is observed at multiple viewpoints, and each speaker dictates multiple phrases. Learning to disentangle the specified factors from the unspecified ones becomes easier when strong supervision is possible. Suppose that during training, we have access to pairs of images, where each pair shows two different objects captured from the same viewpoint. This source of alignment allows us to solve our task using existing methods. However, labels for the unspecified factors are usually unavailable in realistic scenarios where data acquisition is not strictly controlled. We address the problem of disentanglement in this more general setting by combining deep convolutional autoencoders with a form of adversarial training. Both factors of variation are implicitly captured in the organization of the learned embedding space, and can be used for solving single-image analogies. Experimental results on synthetic and real datasets show that the proposed method is capable of generalizing to unseen classes and intra-class variabilities.
研究动机与目标
- 促使学习能够将标签相关因素与其他变异因素分离的表示。
- 提出一个结合VAE和GAN的条件生成模型,以在弱监督下实现解耦。
- 实现无需对强标记的干扰因素进行强标注的单图像类比和条件生成等任务。
- 展示模型对未见身份和跨合成与真实数据集的同类内变异具有泛化能力。
提出的方法
- 提出一个具有指定因子 s 和未指定潜在变量 z 的两源条件生成模型。
- 使用编码器将 x 映射到 (s, z),共享网络分支为两个头部。
- 训练解码器 p_theta(x|z,s) 以从 z 和 s 重建和采样 x。
- 引入判别式(GAN)正则化,以在视图切换时防止关于 s 的信息泄露到 z。
- 优化一个将 VAE 证据下界与基于 GAN 的损失相结合的目标,以实现解耦。
- 提供一个训练过程,在样本之间互换指定因素和未指定因素,以鼓励与分类身份的一致性。
实验结果
研究问题
- RQ1深度生成模型在弱监督下是否能够将指定的变化因素与未指定因素解耦?
- RQ2在样本之间交换指定因素和未指定因素并使用判别器,是否能在没有对齐数据的情况下强制有意义的解耦?
- RQ3学习到的 s 与 z 组件在跨数据集的类别身份和类内变异方面有多好?
- RQ4模型是否能将解耦推广到训练中未见的身份与变异?
- RQ5对抗正则化对生成样本质量和表示解耦有何影响?
主要发现
- 该模型在多个数据集上显著实现指定因素与未指定因素的解耦。
- 指定分量在身份信息上保留了较高的信息含量,在分类任务上接近有监督基线。
- 未指定分量在很大程度上对身份保持不变,在分类测试中近似于随机基线。
- 单图像类比与插值显示对生成样本在两类因素上的连贯控制。
- 定量结果显示在解耦方面具有竞争力,并在未见身份和类内变异方面具有显著泛化。
- 对抗正则化至关重要;没有它,模型会崩溃为忽略指定分量。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。