QUICK REVIEW

[Paper Review] Closed-Form Factorization of Latent Semantics in GANs

Yujun Shen, Bolei Zhou|arXiv (Cornell University)|Jul 13, 2020

Generative Adversarial Networks and Image Synthesis27 references63 citations

TL;DR

Introduces SeFa, a closed-form, unsupervised method to discover latent semantic directions in GANs by factorizing the first-layer transformation weights, enabling versatile image editing without training or data sampling.

ABSTRACT

A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images. In order to identify such latent dimensions for image editing, previous methods typically annotate a collection of synthesized samples and train linear classifiers in the latent space. However, they require a clear definition of the target attribute as well as the corresponding manual annotations, limiting their applications in practice. In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner. In particular, we take a closer look into the generation mechanism of GANs and further propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights. With a lightning-fast implementation, our approach is capable of not only finding semantically meaningful dimensions comparably to the state-of-the-art supervised methods, but also resulting in far more versatile concepts across multiple GAN models trained on a wide range of datasets.

Motivation & Objective

Reveal latent semantic directions learned by GANs without supervision or data sampling.
Analyze the first projection step of GAN generators to identify influential latent factors.
Demonstrate generalization of discovered semantics across multiple GAN architectures and datasets.

Proposed method

Model the GAN generator as a sequence of layer-wise projections and focus on the first affine step G1(z)=Az+b.
Formulate an unsupervised optimization to maximize ||An||2 over unit-n vectors to find semantic directions n that induce large changes after the first projection.
Extend to k directions by solving for the top-k eigenvectors of A^T A.
Conclude that the optimal directions are the top eigenvectors of A^T A (SeFa).
Apply SeFa to various GAN architectures (PGGAN, StyleGAN, StyleGAN2, BigGAN) by using weights from targeted layers or concatenated layers in StyleGAN families.

Experimental results

Research questions

RQ1Can latent semantic directions be discovered without labeled data or attribute predictors?
RQ2What is the minimal, model-weight-based mechanism that exposes meaningful latent semantics in GANs?
RQ3Do the discovered directions generalize across different GAN architectures and datasets?
RQ4How do unsupervised SeFa directions compare to supervised methods in editing quality and diversity?

Key findings

SeFa identifies diverse, human-interpretable latent directions by decomposing the first linear transform in the generator.
The discovered directions form a hierarchical and layer-dependent structure consistent with prior observations in StyleGAN-based models.
SeFa achieves editing capabilities comparable to supervised methods for several attributes while being completely data- and label-free.
SeFa reveals a broader set of semantics than some supervised methods, enabling manipulation of attributes not easily covered by binary predictors.
Qualitative and user studies show SeFa-directed edits preserve identity and other attributes better than some sampling-based baselines in certain cases.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.