QUICK REVIEW

[Paper Review] 3D GAN Inversion for Controllable Portrait Image Animation

Connor Z. Lin, David B. Lindell|arXiv (Cornell University)|Mar 25, 2022

Generative Adversarial Networks and Image Synthesis25 citations

TL;DR

The paper presents a method to animate and edit portrait images by inverting to a pre-trained 3D GAN (EG3D) with 3DMM-based expression control, enabling multi-view consistent pose, expression, and attribute edits, plus video re-enactment.

ABSTRACT

Millions of images of human faces are captured every single day; but these photographs portray the likeness of an individual with a fixed pose, expression, and appearance. Portrait image animation enables the post-capture adjustment of these attributes from a single image while maintaining a photorealistic reconstruction of the subject's likeness or identity. Still, current methods for portrait image animation are typically based on 2D warping operations or manipulations of a 2D generative adversarial network (GAN) and lack explicit mechanisms to enforce multi-view consistency. Thus these methods may significantly alter the identity of the subject, especially when the viewpoint relative to the camera is changed. In this work, we leverage newly developed 3D GANs, which allow explicit control over the pose of the image subject with multi-view consistency. We propose a supervision strategy to flexibly manipulate expressions with 3D morphable models, and we show that the proposed method also supports editing appearance attributes, such as age or hairstyle, by interpolating within the latent space of the GAN. The proposed technique for portrait image animation outperforms previous methods in terms of image quality, identity preservation, and pose transfer while also supporting attribute editing.

Motivation & Objective

Motivate portrait image animation that preserves identity while allowing pose and expression edits across views.
Leverage a 3D-aware GAN (EG3D) with 3DMM-based supervision to controllably edit expressions.
Enable appearance attribute editing (e.g., age, hairstyle, gender) through latent-space manipulation.
Provide a pipeline for static image animation as well as video-driven portrait re-enactment.
Address occlusions and in-painting through GAN inversion and targeted finetuning.

Proposed method

Use DECA to estimate and transfer 3DMM expressions from target to source image.
Perform 3D GAN inversion by optimizing a latent code w to reconstruct the expression-edited region, with mask-based losses.
Fine-tune the GAN generator after inversion to better match the non-face regions while keeping the mouth in-painted.
Render the edited portrait at target poses by conditioning the EG3D model on target pose parameters.
Incorporate attribute editing by training StyleFlow to map latent codes to attribute-modified codes for the 3D GAN, enabling edits like age, hair, gender.

Experimental results

Research questions

RQ1Can explicit 3DMM-based expression and pose editing, combined with 3D GAN inversion, achieve multi-view consistent portrait animation with high identity preservation?
RQ2Does embedding expression-edited images into a 3D GAN latent space enable realistic in-painting and pose-rendering across views?
RQ3Can semantic attribute edits (age, hairstyle, gender) be integrated into the animation pipeline through latent-space manipulation?
RQ4How does the 3DGAN-based approach compare to 2D-GAN and 3DMM-based baselines in terms of image quality, identity preservation, and pose consistency?
RQ5Is the method extendable to video-based portrait re-enactment with temporal consistency?

Key findings

Method	FID ↓	ID ↑	APD ↓	AED ↓
PIRenderer (w/o eyes, w/o pose)	53.916	-	0.250	0.437
PIRenderer (w/o pose)	53.959	-	0.247	0.386
PIRenderer (w/o eyes)	63.844	0.694	0.039	0.424
PIRenderer	64.379	0.700	0.040	0.373
2D GAN (w/o pose)	17.812	-	0.246	0.434
3D GAN (w/o pose)	16.504	-	0.246	0.433
3D GAN	31.176	0.733	0.030	0.433

The 3D GAN inversion pipeline yields higher identity preservation and pose consistency than 2D-GAN baselines and PIRenderer.
The approach enables explicit pose control with multi-view consistency while preserving subject identity.
Attribute editing (age, hair, gender) is feasible via latent-space manipulation and integrated into the animation pipeline.
Quantitative results show favorable FID, identity consistency, and pose-alignment metrics for the 3D GAN variant compared to baselines.
The method supports video-based re-enactment with smoothed pose estimates to reduce jitter and maintains realistic occlusion in-painting.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.