QUICK REVIEW

[論文レビュー] 3D GAN Inversion for Controllable Portrait Image Animation

Connor Z. Lin, David B. Lindell|arXiv (Cornell University)|Mar 25, 2022

Generative Adversarial Networks and Image Synthesis被引用数 25

ひとこと要約

この論文は、事前学習済みの3D GAN（EG3D）へ inversion し、3DMMベースの表情制御を用いて、ポートレート画像をアニメーション化・編集する方法を提示します。これにより、多視点で一貫した姿勢・表情・属性編集と動画再現が可能になります。

ABSTRACT

Millions of images of human faces are captured every single day; but these photographs portray the likeness of an individual with a fixed pose, expression, and appearance. Portrait image animation enables the post-capture adjustment of these attributes from a single image while maintaining a photorealistic reconstruction of the subject's likeness or identity. Still, current methods for portrait image animation are typically based on 2D warping operations or manipulations of a 2D generative adversarial network (GAN) and lack explicit mechanisms to enforce multi-view consistency. Thus these methods may significantly alter the identity of the subject, especially when the viewpoint relative to the camera is changed. In this work, we leverage newly developed 3D GANs, which allow explicit control over the pose of the image subject with multi-view consistency. We propose a supervision strategy to flexibly manipulate expressions with 3D morphable models, and we show that the proposed method also supports editing appearance attributes, such as age or hairstyle, by interpolating within the latent space of the GAN. The proposed technique for portrait image animation outperforms previous methods in terms of image quality, identity preservation, and pose transfer while also supporting attribute editing.

研究の動機と目的

Identityを保ちながら、ビュー間で姿勢と表情編集を可能にするポートレート画像アニメーションを動機づける。
3DMMベースの監督下で3D対応のGAN（EG3D）を活用し、表情を制御可能に編集する。
潜在空間操作を通じて外観属性編集（例：年齢、髪型、性別）を実現する。
静止画像のアニメーションと動画ベースのポートレート再演出の両方のパイプラインを提供する。
GAN inversionとターゲットを絞ったファインチューニングによって遮蔽処理とインペイントを解決する。

提案手法

TARGET 画像から3DMM表情を推定・転送するためにDECAを用いる。
マスクベースの損失を用いて、表情編集領域を再構成するための潜在コードwを最適化して3D GAN inversionを実行する。
inversion後にGANジェネレータをファインチューニングして非顔領域をより良く一致させつつ、口元はインペイントのまま維持する。
ターゲット姿勢パラメータでEG3Dモデルを条件付けして、編集済みポートレートをターゲット姿勢でレンダリングする。
StyleFlowを訓練して潜在コードを3D GAN用の属性変更コードへマッピングすることで、年齢・髪型・性別といった属性編集をアニメーションパイプラインに統合する。

実験結果

リサーチクエスチョン

RQ1 explicit 3DMMベースの表情と姿勢編集を3D GAN inversionと組み合わせることで、多視点で一貫したポートレートアニメーションを高いアイデンティティ保存とともに達成できるか？
RQ2 表情編集済み画像を3D GAN潜在空間に埋め込むことで、視点間のリアルなインペイントと姿勢レンダリングを可能にするか？
RQ3 潜在空間操作を通じてセマンティック属性編集（年齢、髪型、性別）をアニメーションパイプラインに統合できるか？
RQ4 3DGANベースのアプローチは、画像品質、アイデンティティ保存、姿勢の一貫性という点で2D-GANや3DMMベースのベースラインと比較してどうか？
RQ5 時間的一貫性を持つ動画ベースのポートレート再演出へ拡張可能か？

主な発見

手法	FID ↓	ID ↑	APD ↓	AED ↓
PIRenderer (w/o eyes, w/o pose)	53.916	-	0.250	0.437
PIRenderer (w/o pose)	53.959	-	0.247	0.386
PIRenderer (w/o eyes)	63.844	0.694	0.039	0.424
PIRenderer	64.379	0.700	0.040	0.373
2D GAN (w/o pose)	17.812	-	0.246	0.434
3D GAN (w/o pose)	16.504	-	0.246	0.433
3D GAN	31.176	0.733	0.030	0.433

3D GAN inversionパイプラインは、2D-GANベースラインやPIRendererと比べてアイデンティティ保存と姿勢の一貫性が高い。
Explicitな姿勢制御と多視点一貫性を実現しつつ、被写体のアイデンティティを保持できる。
潜在空間操作による属性編集（年齢、髪、性別）はアニメーションパイプラインへ組み込むことで実現可能。
定量的結果は、ベースラインと比較して3D GANバリアントでFID、アイデンティティ一貫性、姿勢整合性の指標が有利である。
動画ベースの再演出をサポートし、姿勢推定を滑らかにしてジャitterを低減し、現実的な遮蔽インペイントを維持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。