QUICK REVIEW

[論文レビュー] AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars

Yue Wu, Yu Deng|arXiv (Cornell University)|Oct 12, 2022

Generative Adversarial Networks and Image Synthesis被引用数 21

ひとこと要約

AniFaceGANは、テンプレート放射場と表現駆動の変形場を組み合わせ、3D空間模倣損失で訓練することで、未構造の2D画像から強い3D一貫性と制御可能な表現を実現し、アニメーション可能な3D認識顔画像を無条件に生成します。

ABSTRACT

Although 2D generative models have made great progress in face image generation and animation, they often suffer from undesirable artifacts such as 3D inconsistency when rendering images from different camera viewpoints. This prevents them from synthesizing video animations indistinguishable from real ones. Recently, 3D-aware GANs extend 2D GANs for explicit disentanglement of camera pose by leveraging 3D scene representations. These methods can well preserve the 3D consistency of the generated images across different views, yet they cannot achieve fine-grained control over other attributes, among which facial expression control is arguably the most useful and desirable for face animation. In this paper, we propose an animatable 3D-aware GAN for multiview consistent face animation generation. The key idea is to decompose the 3D representation of the 3D-aware GAN into a template field and a deformation field, where the former represents different identities with a canonical expression, and the latter characterizes expression variations of each identity. To achieve meaningful control over facial expressions via deformation, we propose a 3D-level imitative learning scheme between the generator and a parametric 3D face model during adversarial training of the 3D-aware GAN. This helps our method achieve high-quality animatable face image generation with strong visual 3D consistency, even though trained with only unstructured 2D images. Extensive experiments demonstrate our superior performance over prior works. Project page: https://yuewuhkust.github.io/AniFaceGAN

研究の動機と目的

ビデオアバターのためのアニメーション可能で3D一貫した顔生成を動機づける。
識別情報と表現を明示的な3D事前知識で分離する。
未構造の2D画像から表現制御可能で多視点レンダリングを実現する。

提案手法

ニュートラル表現のテンプレート放射場と表現の変形場という2つの3D場を生成する。
入力アイデンティティz_id、表現z_exp、視点方向を用いてカラーと占有をレンダリングするGRAMベースのテンプレート放射場を使用する。
z_idとz_expを用いてターゲット空間の点をテンプレート空間へ写像する逆3D変形Fを導入する。
生成ジオメトリと変形を3D Morphable Model (3DMM)と整合させる3D空間模倣損失を課す。密集ジオメトリ模倣、3Dランドマーク模倣、3DMM変形模倣を含む。
実データと合成画像に対する識別器を用いた対立的訓練を適用する。

実験結果

リサーチクエスチョン

RQ1未構造の2D画像で訓練された場合、 unconditional な3D対応GANはさまざまな姿勢と表現に対して高品質で3D一貫した顔画像を実現できるか。
RQ23DMM事前知識を用いた3D空間模倣損失を課すことで、識別特性を分離したまま明確で有意義な表現制御を可能にできるか。
RQ3提案された変形ベースの表現モデルは、3D一貫性とビデオ品質指標の点で2D空間模倣手法と比較してどうか。

主な発見

手法	FID	KID ×100	PSNR	SSIM
GRAM [12]	19.4	0.64	-	-
CONFIG [30]	52.6	3.38	-	-
DiscoFaceGAN [11]	17.9	0.79	-	-
DiscoFaceGRAM	23.9	1.19	-	-
Ours	19.9	0.86	-	-

強力な3D一貫性と高品質なアニメーション可能な顔画像を、姿勢と表現を跨いで達成する。
3D空間模倣損失（密集ジオメトリ、3Dランドマーク、3DMM変形）は、ベースラインより忠実度と表現制御性を向上させる。
アイデンティティと表現の分離は、明示的な変形によって2Dのみのベースラインより表現と姿勢の一貫性を高める。
複数ビュー評価におけるNeuSベースの再構成PSNR/SSIMで、我々の手法は3D一貫性が優れていることを示す。
定量的にはFID/KIDで競争力があり、表現/姿勢の分離と3D一貫性の点で顕著な改善を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。