QUICK REVIEW

[論文レビュー] RSGAN: Face Swapping and Editing using Face and Hair Representation in Latent Spaces

Ryota Natsume, Tatsuya Yatagawa|arXiv (Cornell University)|Apr 10, 2018

Face recognition and analysis参考文献 15被引用数 43

ひとこと要約

RSGANは領域分離型VAE（顔と髪）とGANを導入し、潜在空間で顔を交換して属性を編集することを可能にし、対ペアの微調整なしで堅牢な顔の交換と柔軟な編集を実現します。

ABSTRACT

In this paper, we present an integrated system for automatically generating and editing face images through face swapping, attribute-based editing, and random face parts synthesis. The proposed system is based on a deep neural network that variationally learns the face and hair regions with large-scale face image datasets. Different from conventional variational methods, the proposed network represents the latent spaces individually for faces and hairs. We refer to the proposed network as region-separative generative adversarial network (RSGAN). The proposed network independently handles face and hair appearances in the latent spaces, and then, face swapping is achieved by replacing the latent-space representations of the faces, and reconstruct the entire face image with them. This approach in the latent space robustly performs face swapping even for images which the previous methods result in failure due to inappropriate fitting or the 3D morphable models. In addition, the proposed system can further edit face-swapped images with the same network by manipulating visual attributes or by composing them with randomly generated face or hair parts.

研究の動機と目的

自動的な顔交換と外見編集の統一システムを動機づける。
顔領域と髪領域の別々の潜在空間を学習する領域分離GANを提案する。
潜在表現を交換して完全な画像を再構成することで顔交換を実現する。
同じネットワーク内で属性ベースの編集とランダム部位生成をサポートする。
姿勢やライティング、表情の変化に対して頑健性を示し、対ペアの微調整を回避する。

提案手法

2つのVAE（セパレーターネットワーク）が顔と髪の外観を別々の潜在空間(z_f, z_h)にエンコードする。
1つのGANベースの作成者ネットワークが対応する潜在コードから完全な画像を再構成する。
顔・髪・全体画像の3つの再構築損失を用い、前景のディテールを強調する背景マスクを用意する。
潜在空間を正則化するKL発散損失と、グローバルおよびパッチ識別器からの対抗的損失がリアリズムを誘導する。
分類器ネットワークが入力画像から視覚的属性を推定し、属性条件付き編集を可能にする。
顔交換時には2つの入力からの潜在コードを次のように組み合わせてx′ = G(z_xf, z_cf, z_xh, z_ch)とする。
髪/背景の一貫性を向上させる勾配領域ステッチングを適用することもできる（RSGAN-GD）。
データセットはCelebAを用いて顔髪領域をセグメンテーションし、学習用パッチを抽出して構築する。

実験結果

リサーチクエスチョン

RQ1顔と髪の潜在表現を分離することは、姿勢・照明・表情の変動に対する顔交換の堅牢性と品質を向上させるか？
RQ2同じ潜在空間フレームワークは追加の対ペア微調整なしに属性ベースの編集とランダム部位生成をサポートできるか？
RQ3領域分離モデリングは以前の手法と比較して同一性保持と交換の一貫性にどのような影響を与えるか？
RQ4これらのタスクに対して変分潜在空間を用いるのと非変分エンコーダを用いることの影響はどうなるか？

主な発見

RSGANは多様なポーズと照明条件の下で自然に見える顔交換結果を達成する。
視覚属性は対応する潜在コードを操作することで編集でき、顔や髪の特定の変更をクロス影響なく実現できる。
顔または髪の潜在空間をランダムにサンプリングすることで、他方の領域を保持したまま新しい外観を生成できる。
RSGANは競合的な交換一貫性を示し、報告された指標でいくつかのベースライン生成モデルを上回るが、特定の3DMMベースの手法が一部のケースでアイデンティティ忠実度をより高く維持する可能性がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。