QUICK REVIEW

[論文レビュー] Disentangling factors of variation in deep representations using adversarial training

Michaël Mathieu, Junbo Zhao|arXiv (Cornell University)|Nov 10, 2016

Generative Adversarial Networks and Image Synthesis参考文献 21被引用数 252

ひとこと要約

この論文は、条件付き変分オートエンコーダを、深層表現の指定された変動要因を未指定のものから分離するよう adversarial training と結合して提案します。これにより、弱い監督下で未監督のような分離を実現します。単一画像のアナロジーのデモンストレーションと、複数データセットにおける未知のアイデンティティへの一般化を示します。

ABSTRACT

We introduce a conditional generative model for learning to disentangle the hidden factors of variation within a set of labeled observations, and separate them into complementary codes. One code summarizes the specified factors of variation associated with the labels. The other summarizes the remaining unspecified variability. During training, the only available source of supervision comes from our ability to distinguish among different observations belonging to the same class. Examples of such observations include images of a set of labeled objects captured at different viewpoints, or recordings of set of speakers dictating multiple phrases. In both instances, the intra-class diversity is the source of the unspecified factors of variation: each object is observed at multiple viewpoints, and each speaker dictates multiple phrases. Learning to disentangle the specified factors from the unspecified ones becomes easier when strong supervision is possible. Suppose that during training, we have access to pairs of images, where each pair shows two different objects captured from the same viewpoint. This source of alignment allows us to solve our task using existing methods. However, labels for the unspecified factors are usually unavailable in realistic scenarios where data acquisition is not strictly controlled. We address the problem of disentanglement in this more general setting by combining deep convolutional autoencoders with a form of adversarial training. Both factors of variation are implicitly captured in the organization of the learned embedding space, and can be used for solving single-image analogies. Experimental results on synthetic and real datasets show that the proposed method is capable of generalizing to unseen classes and intra-class variabilities.

研究の動機と目的

ラベルに関連する要因を他の変動から分離する表現の学習を動機づける。
弱い監督下での分離を実現するため、VAEとGANを組み合わせた条件付き生成モデルを提案する。
強くラベル付けされたノイズ要因を必要とせず、単一画像アナロジーや条件付き生成といったタスクの解決を可能にする。
合成データと実データセットを跨いで、未知のアイデンティティおよびクラス内変動へ一般化することを示す。

提案手法

指定因子 s と未指定潜在変数 z を持つ二源の条件付き生成モデルを導入する。
共有ネットワークを用いて x を (s, z) に写像し、二つのヘッドに分岐させる。
p_theta(x|z,s) を用いて z と s から x を再構成・サンプルするようにデコーダを訓練する。
ビューを入れ替えた際に z への s に関する情報の漏洩を防ぐ判別的（GAN）正則化を取り入れる。
分離を強制するため、VAEのEvidence Lower BoundとGANベースの損失を組み合わせた目的関数を最適化する。
サンプル間で指定因子と未指定因子を入れ替えるトレーニング手順を提供し、クラスアイデンティティとの整合を促す。

実験結果

リサーチクエスチョン

RQ1深層生成モデルは、弱い監督下で、指定された変動要因を未指定のものから分離できるか？
RQ2整列データなしで、サンプル間の指定因子と未指定因子を入れ替えることで、識別器を用いて意味のある分離を強制できるか？
RQ3学習された s および z の成分は、データセットを跨いだクラスアイデンティティとクラス内変動をどの程度捉えているか？
RQ4トレーニング中に見られなかった未知のアイデンティティや変動へ分離を一般化できるか？
RQ5敵対的正則化が生成サンプルの品質と表現の分離に与える影響は何か？

主な発見

本モデルは複数のデータセットで、指定された因子と未指定因子の明らかな分離を可能にする。
指定成分はアイデンティティに関する高度な情報量を保持し、分類タスクで教師ありベースラインに近づく。
未指定成分はアイデンティティに対してほぼ不変で、分類テストではほぼランダムなベースラインのように振る舞う。
単一画像アナロジーと補間は、両方の要因に沿った生成サンプルの一貫した制御を示す。
定量的な結果は、未知のアイデンティティおよびクラス内変動への顕著な一般化を伴う、競争力のある分離を示す。
敵対的正則化は極めて重要であり、これがないとモデルは指定成分を無視する崩壊を起こす。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。