QUICK REVIEW

[論文レビュー] Deforming Autoencoders: Unsupervised Disentangling of Shape and Appearance

Zhixin Shu, Mihir Sahasrabudhe|arXiv (Cornell University)|Jun 18, 2018

Face recognition and analysis参考文献 39被引用数 89

ひとこと要約

tldr: 本研究は、形状（変形）と外観（テクスチャ）を教師なしで分離する Deforming Autoencoders を提案し、教師なしのアライメント、形状/外観の内挿、および intrinsic shading/albedo 分解を可能にする。

ABSTRACT

In this work we introduce Deforming Autoencoders, a generative model for images that disentangles shape from appearance in an unsupervised manner. As in the deformable template paradigm, shape is represented as a deformation between a canonical coordinate system (`template') and an observed image, while appearance is modeled in `canonical', template, coordinates, thus discarding variability due to deformations. We introduce novel techniques that allow this approach to be deployed in the setting of autoencoders and show that this method can be used for unsupervised group-wise image alignment. We show experiments with expression morphing in humans, hands, and digits, face manipulation, such as shape and appearance interpolation, as well as unsupervised landmark localization. A more powerful form of unsupervised disentangling becomes possible in template coordinates, allowing us to successfully decompose face images into shading and albedo, and further manipulate face images.

研究の動機と目的

教師なしで形状と外観を、可変テンプレートパラダイムを用いて分離する。
画像生成を、典型空間でのテクスチャ合成と、画像座標への学習変形の和としてモデル化する。
教師なしのアライメント、形状/外観の内挿、および intrinsic shading/relighting の分解を可能にする。
学習を改善するため、クラス認識に敏感な変形モデルと微分同相変形制約の導入を検討する。

提案手法

画像を、ZT（外観）とZS（形状）に分割されたコンパクトな潜在表現 Z にエンコードする。
ZT と ZS から、外観 T と変形場 W を合成する2つのデコーダを用い、それを Spatial Transformer Layer で W によって T をワープして I を再構成する。
変形を、アフィン ST レイヤと、空間勾配 ∇xW および ∇yW を予測する微分デコーダからなる非剛性場で表現し、空間積分して W を形成する。
変形勾配に ReLU（または HardTanh）を適用して、局所的に一貫した、反転しない変形を強制し、折り重ねを防ぐ。
多クラスデータに対して、外観と形状デコーダを条件付けるためのクラス認識潜在成分 ZC をオプションで含める。
Separate shading S と albedo A のデコーダを持つ Intrinsic Deforming Autoencoder (Intrinsic-DAE) に拡張し、テクスチャを T = S ∘ A とモデル化し、∇S 正則化による滑らかなシェーディングを強制する。さらに、現実感向上のために対向的損失（PatchGAN）を任意で使用する。
再構成損失と、ワーピングの正則化項（滑らかさとバイアス低減）を加え、活性化時にはシェーディングおよび敵対的損失を併用して訓練する。

実験結果

リサーチクエスチョン

RQ1変形場と正準テクスチャ空間を明示的にモデリングすることで、教師なしオートエンコーダで形状と外観を分離できるだろうか。
RQ2微分同相で正則化された変形を取り入れることで、教師なし設定における画像アライメント、内挿品質、およびランドマーク定位を改善できるか。
RQ3クラス情報は、Deforming Autoencoder における多モード外観モデリングを改善できるか。
RQ4画像がテンプレート空間に揃えられている場合、シェーディングとアルベドへの intrinsic 分解を教師なしで学習できるか。

主な発見

DAE は、学習された変形でワープされた正準空間のテクスチャとして画像を再構成することにより、形状と外観を有効に分離する。
Class-aware Deforming Autoencoder は、多クラス外観モデリングを改善し、よりシャープな画像を生成する。
Intrinsic-DAE は教師なしのシェーディングとアルベド分解を実現し、リライティング/照明変更のシミュレーションを可能にする。
Unsupervised alignment is feasible, achieving competitive landmark localization accuracy compared to self-supervised methods.
The deformation field learning bridges to improved unsupervised landmark detection and image registration tasks.
Adversarial training enhances visual sharpness in Intrinsic-DAE without compromising the disentanglement of deformation, shading, and albedo.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。