QUICK REVIEW

[論文レビュー] Controlling generative models with continuous factors of variations

Antoine Plumerault, Hervé Le Borgne|arXiv (Cornell University)|Jan 28, 2020

Generative Adversarial Networks and Image Synthesis参考文献 43被引用数 79

ひとこと要約

本論文は、ラベルなしで連続的な画像変化を符号化する解釈可能な潜在空間方向を発見する方法を導入し、GANやVAEにおける生成を正確に制御できるようにします。

ABSTRACT

Recent deep generative models are able to provide photo-realistic images as well as visual or textual content embeddings useful to address various tasks of computer vision and natural language processing. Their usefulness is nevertheless often limited by the lack of control over the generative process or the poor understanding of the learned representation. To overcome these major issues, very recent work has shown the interest of studying the semantics of the latent space of generative models. In this paper, we propose to advance on the interpretability of the latent space of generative models by introducing a new method to find meaningful directions in the latent space of any generative model along which we can move to control precisely specific properties of the generated image like the position or scale of the object in the image. Our method does not require human annotations and is particularly well suited for the search of directions encoding simple transformations of the generated image, such as translation, zoom or color variations. We demonstrate the effectiveness of our method qualitatively and quantitatively, both for GANs and variational auto-encoders.

研究の動機と目的

制御可能な生成モデルと解釈可能な潜在表現の必要性を動機づける。
ラベルやエンコーダを必要とせず、連続的な変動要因に対応する潜在空間の方向を見つける方法を提案する。
潜在空間の方向に沿って移動することで、物体の位置やスケールなどの画像特性を正確に制御できることを示す。
生成器を逆算する再構成損失と潜在軌跡を推定する最適化戦略を提供する。
デisentanglementが制御性に与える影響を調べ、モデル間の潜在空間構造を明らかにする。

提案手法

G:Z->Iを定義し、パラメータtを持つ連続変換Tに対してG(z_T) ≈ T_T(I)となる z_T を求める。
再構成損失Lを用い、zのノルム制約(||z|| ≤ sqrt(d))の下でL(G(z), T(I))を最小化する。
逆算のシャープネスを向上させるために、テクスチャを保持する損失L(I1,I2) = ||F{I1−I2}F{σ}||^2を提案する。
T_Tを小さなステップに分解し、前のzを初期化として用い、順次z_nを最適化して潜在的軌跡を辿る（アルゴリズム1）。
変動要因をt = f(z) = g(<z,u>)としてエンコードし、||u||=1、 latent projections から遅延パラメータ変化 δt を予測するよう g_theta を訓練して、要因のパラメトリックモデルを有効にする。
未知のtを扱うには、tではなくδtを予測するよう訓練し、式(6)と提案されたg_thetaを用いて要因分布を捉える写像を学習する。
任意でzをサンプルして、学習済みのg_thetaと選択したターゲット分布を用いて生成出力の分布を形作る。

実験結果

リサーチクエスチョン

RQ1ラベルやエンコーダを必要とせず、連続的な変動要因を潜在空間の方向として捉えることができるか？
RQ2潜在方向に沿ってオブジェクトの翻訳とスケーリングをどの程度正確に制御できるか？
RQ3再構成損失のうち、生成器を効果的に逆算しテクスチャを保持するのに有効なものはどれか？
RQ4潜在空間の分離性（ディスエンタングルメント）が生成制御能力にどう影響するか？
RQ5特定された方向は、オブジェクトカテゴリやモデル間（例えばBigGANカテゴリー間やVAE間）で共有されるか？

主な発見

水平方向・垂直方向の平行移動およびスケールに対応する潜在空間の方向は、BigGANおよびβ-VAE設定で生成画像を正確に制御可能にする。
方向はBigGAN潜在コードの初期部に大部分がエンコードされており、垂直位置は背景の相関の影響で高レベルブロックの関与が強い。
ディスエンタングルメントされた表現（β-VAEsのβを高くする）は制御性を向上させ、制御因子の標準偏差を減少させる。
高周波成分を低減する新しい再構成損失は、ピクセル単位のMSEより鋭い逆算とより現実的な再構成を生む。
提案された軌跡ベースの最適化はエンコーダーによる逆算や追加の訓練を必要とせず、既存の生成器に直接適用できる。
変動因子の共通の方向は複数のオブジェクトカテゴリで共有されるようで、カテゴリ非依存の潜在構造を示唆している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。