QUICK REVIEW

[Paper Review] Generative Face Completion

Yijun Li, Sifei Liu|arXiv (Cornell University)|Apr 19, 2017

Generative Adversarial Networks and Image Synthesis28 references93 citations

TL;DR

This paper presents a deep generative model for face completion that uses an autoencoder generator with two adversarial discriminators (local and global) plus a semantic parsing loss to produce semantically coherent and photorealistic missing regions in faces.

ABSTRACT

In this paper, we propose an effective face completion algorithm using a deep generative model. Different from well-studied background completion, the face completion task is more challenging as it often requires to generate semantically new pixels for the missing key components (e.g., eyes and mouths) that contain large appearance variations. Unlike existing nonparametric algorithms that search for patches to synthesize, our algorithm directly generates contents for missing regions based on a neural network. The model is trained with a combination of a reconstruction loss, two adversarial losses and a semantic parsing loss, which ensures pixel faithfulness and local-global contents consistency. With extensive experimental results, we demonstrate qualitatively and quantitatively that our model is able to deal with a large area of missing pixels in arbitrary shapes and generate realistic face completion results.

Motivation & Objective

Motivate robust face completion beyond patch-based background filling by generating semantically valid content for missing facial regions.
Develop a deep autoencoder-based generator conditioned on context to fill large, irregular masks in faces.
Regularize generation with both local and global adversarial losses to ensure realism and global consistency.
Incorporate a semantic parsing network to enforce face-structure consistency with surrounding context.
Demonstrate effectiveness on CelebA with qualitative and quantitative assessments across varying mask sizes and shapes.

Proposed method

Encoder-decoder generator based on VGG-19 features extended with extra layers.
Two discriminators: a local discriminator focuses on realism within the masked region, a global discriminator enforces image-wide realism.
A fixed semantic parsing network provides a semantic regularization loss to align generated content with facial parts.
An explicit reconstruction loss (Lr) complements adversarial losses to stabilize training.
An overall loss L = Lr + λ1La1 + λ2La2 + λ3Lp balances pixel fidelity, local realism, global realism, and parsing consistency.
A curriculum training strategy gradually introduces adversarial and parsing losses to stabilize learning.

Experimental results

Research questions

RQ1Can a deep generative model synthesize semantically valid and photorealistic missing facial regions without external patch databases?
RQ2Does adding local and global adversarial losses along with semantic parsing improve completion realism and facial coherence?
RQ3How does the model perform across large, irregular masks and under face pose/alignment variations?
RQ4To what extent does semantic regularization preserve identity and facial structure during completion?
RQ5What is the impact of different mask sizes on completion quality and identity retention?

Key findings

Qualitative results show realistic, semantically plausible face completions for large and irregular masks.
Quantitative results on CelebA indicate improvements over baselines in SSIM and PSNR across six mask configurations.
Identity-distance metrics suggest the method preserves identity better than simple reconstruction or random-noise filling, though gaps remain for large masks.
A dual-discriminator setup (local and global) combined with semantic parsing yields more coherent details and facial feature alignment.
The method generalizes to varied mask sizes (smaller masks perform best) and to different occlusion patterns.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.