QUICK REVIEW

[論文レビュー] Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations

Diane Bouchacourt, Ryota Tomioka|arXiv (Cornell University)|May 24, 2017

Generative Adversarial Networks and Image Synthesis被引用数 137

ひとこと要約

ML-VAE は、グループ化されたデータから、グループ内で共有される内容と観察ごとのスタイルをモデリングすることで、表現を分離可能に学習し、証拠の蓄積と未知のグループへのテスト時の一般化を可能にします。

ABSTRACT

We would like to learn a representation of the data which decomposes an observation into factors of variation which we can independently control. Specifically, we want to use minimal supervision to learn a latent representation that reflects the semantics behind a specific grouping of the data, where within a group the samples share a common factor of variation. For example, consider a collection of face images grouped by identity. We wish to anchor the semantics of the grouping into a relevant and disentangled representation that we can easily exploit. However, existing deep probabilistic models often assume that the observations are independent and identically distributed. We present the Multi-Level Variational Autoencoder (ML-VAE), a new deep probabilistic model for learning a disentangled representation of a set of grouped observations. The ML-VAE separates the latent representation into semantically meaningful parts by working both at the group level and the observation level, while retaining efficient test-time inference. Quantitative and qualitative evaluations show that the ML-VAE model (i) learns a semantically meaningful disentanglement of grouped data, (ii) enables manipulation of the latent representation, and (iii) generalises to unseen groups.

研究の動機と目的

弱いグルーピングレベルの監視を通じて、グループ化されたデータに意味をアンカー付けする。
潜在因子を、グループ共有の内容と観察固有のスタイルに分離する。
非 iid のグルーピング観測を扱いながら、アモタイズド推論を維持する。

提案手法

グループ G のすべてのサンプルで共有される内容 C_G と、グループ G の各観測 i に対するスタイル S_i という2レベルの潜在構造を導入する。
q(C_G, S_G|X_G;φ) を、q(C_G|X_G;φ_c) および q(S_i|X_i;φ_s) を用いた分解形式として定義する。
グループごとに総和するグループごとの ELBO を用いる: ELBO(G;θ,φ_s,φ_c) = sum_i in G E_{q(C_G|X_G)} E_{q(S_i|X_i)}[log p(X_i|C_G, S_i; θ)] - KL 項。
個別エンコーディングからの正規密度の積として q(C_G|X_G) を形成することで C_G の証拠を蓄積する（Gaussian Product Rule）。
グループ ELBO を計算し、グループ間で平均化し、グループのミニバッチで θ, φ_c, φ_s を最大化して学習する。
グループあたりの複数のテストサンプルからの証拠蓄積を許すテスト時推論（戦略 2）または単一サンプルの推論（戦略 1）を提供する。

実験結果

リサーチクエスチョン

RQ1グループレベルの監督は、意味的因子を分離可能な潜在空間にアンカー付けできるか？
RQ2グループレベルで内容を、観測レベルでスタイルをモデル化することは、iid の VAE よりも分離を改善するか？
RQ3アモタイズド推論を非 iid のグループ化観測に適応して、テスト時の効率を犠牲にせずに適用できるか？
RQ4グループメンバー間の証拠蓄積は潜在の精度と下流の分類を改善するか？
RQ5学習された分離表現は、テスト時に見たことのないグループへ一般化できるか？

主な発見

ML-VAE は、内容（グループ共有）とスタイル（観測固有）を分離することにより、意味的に有意な分離表現を学習する。
正規分布の積アプローチによる証拠の蓄積は、グループサイズが大きくなるにつれて内容の不確実性を低減する。
未知の識別子を含むデータセットで、テスト時に見ぬグループへ一般化することを示す。
潜在内容 C はクラスラベルに情報を持ち、スタイル S は情報を持たないため、下流の分類が効果的に機能する。
潜在空間での操作（スワップ、補間、生成）は、制御可能な分離と多様体のカバレージを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。