QUICK REVIEW

[论文解读] Disentangling by Factorising

Hyunjik Kim, Andriy Mnih|arXiv (Cornell University)|Feb 16, 2018

Digital Media Forensic Detection参考文献 49被引用 423

一句话总结

FactorVAE 引入总相关性（Total Correlation）惩罚，旨在鼓励 VAE 的潜在编码呈因子分布，与 beta-VAE 相比在相近重建质量下获得更好的解耦。它还提出一种鲁棒的、基于判别器的解耦度度量，并与 InfoWGAN-GP 进行比较。

ABSTRACT

We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. We show that it improves upon $β$-VAE by providing a better trade-off between disentanglement and reconstruction quality. Moreover, we highlight the problems of a commonly used disentanglement metric and introduce a new metric that does not suffer from them.

研究动机与目标

推动无监督学习，以得到与独立数据因子对齐的解耦表示。
分析 beta-VAE 中的解耦与重建之间的权衡，并提出改进该平衡的方法。
引入并验证总相关性惩罚以促进潜在编码中的独立性。
提出一种鲁棒的解耦度量，避免先前度量的弱点。
在包含已知和未知因子的多个数据集上，将 FactorVAE 与 beta-VAE 和 InfoWGAN-GP 进行比较。

提出的方法

在 VAE 目标中加入总相关性项以促进因子化的潜在分布：在最大化对数似然的同时惩罚 KL(q(z)||bar{q}(z))，其中 bar{q}(z) 是边缘分布的乘积。
使用一个判别器通过密度比技巧区分来自 q(z) 与来自 bar{q}(z) 的样本来近似 TC 项。
使用基于置换的采样过程（Alg. 1）在不进行全数据遍历的情况下近似 bar{q}(z)。
将 VAE 与 TC 判别器联合训练，TC 梯度信号由一个 gamma 超参数缩放。
给出 FactorVAE 的伪代码（Alg. 2），并讨论潜在空间散度的稳定性注意事项（相对于数据空间）。
引入一种新的、无超参数的解耦度量，它通过归一化后评估跨维度的最小经验方差来衡量哪个潜在维度对应于一个固定因子。

Figure 1: Architecture of FactorVAE, a Variational Autoencoder (VAE) that encourages the code distribution to be factorial. The top row is a VAE with convolutional encoder and decoder, and the bottom row is an MLP classifier, the discriminator, that distinguishes whether the input was drawn from the

实验结果

研究问题

RQ1潜在编码上的总相关性惩罚是否在不牺牲重建质量的前提下提升解耦？
RQ2在具有已知与未知因子的数据集中，FactorVAE 在解耦和重建方面与 beta-VAE 和 InfoWGAN-GP 相比如何？
RQ3Higgins 等人解耦度量的弱点是什么，是否可以提出一个更鲁棒的替代方案？
RQ4基于判别器的 TC 估计是否能够提供稳定有效的优化信号以实现解耦？

主要发现

FactorVAE 在 2D Shapes 和 3D Shapes 上在相似重建质量下获得了高于 beta-VAE 的解耦得分。
提出的基于 TC 的惩罚降低潜在编码中的总相关性，从而提高潜在维度之间的独立性。
新的解耦度量在概念上更简单、无超参数，并避免了先前度量的失败模式。
在所测试的形状数据集上，InfoWGAN-GP 通常不如基于 VAE 的方法，并对结构敏感。
FactorVAE 在若干数据集上保持具有竞争力或更好的重建质量，同时获得优越的解耦得分，包括那些具有未知因子的数据集。
基于判别器的 TC 估计往往低估了真实的 TC，但在训练过程中，降低 TC 与更好的解耦相关。

Figure 2: Top: Metric in (Higgins et al., 2016 ) . Bottom: Our new metric, where $s\in\mathbb{R}^{d}$ is the scale (empirical standard deviation) of latent representations of the full data (or large enough random subset).

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。