QUICK REVIEW

[论文解读] Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis

Ceyuan Yang, Yujun Shen|arXiv (Cornell University)|Nov 21, 2019

Generative Adversarial Networks and Image Synthesis参考文献 60被引用 41

一句话总结

本论文分析 StyleGAN 与 BigGAN 的逐层潜在编码如何在场景合成中产生层次化、便于人类理解的语义结构，识别布局、对象、属性和配色方案作为涌现的变异因子，并展示如何操控它们。

ABSTRACT

Despite the success of Generative Adversarial Networks (GANs) in image synthesis, there lacks enough understanding on what generative models have learned inside the deep generative representations and how photo-realistic images are able to be composed of the layer-wise stochasticity introduced in recent GANs. In this work, we show that highly-structured semantic hierarchy emerges as variation factors from synthesizing scenes from the generative representations in state-of-the-art GAN models, like StyleGAN and BigGAN. By probing the layer-wise representations with a broad set of semantics at different abstraction levels, we are able to quantify the causality between the activations and semantics occurring in the output image. Such a quantification identifies the human-understandable variation factors learned by GANs to compose scenes. The qualitative and quantitative results further suggest that the generative representations learned by the GANs with layer-wise latent codes are specialized to synthesize different hierarchical semantics: the early layers tend to determine the spatial layout and configuration, the middle layers control the categorical objects, and the later layers finally render the scene attributes as well as color scheme. Identifying such a set of manipulatable latent variation factors facilitates semantic scene manipulation.

研究动机与目标

研究 GAN 在跨越多层抽象水平的场景合成中学习到的语义因素（布局、对象、属性、颜色）。
量化生成器逐层激活与输出语义之间的因果关系，在最先进的 GANs 中。
识别可操控的潜在变异因素，并将其映射到生成器层以实现语义场景编辑。
证明层级化语义在无外部监督下涌现，并实现多样化的场景操控。
展示该方法对不同 GAN 架构（StyleGAN、BigGAN、PGGAN）的泛化性。

提出的方法

将 GAN 潜在码视为输入到多个生成器层的逐层生成表示（逐层随机性）。
定义四个抽象层次（布局、对象、属性、颜色），并使用现成的分类器对合成图像的语义进行评分。
通过训练线性 SVM 判决边界来探测潜在空间，将每个语义概念视为二元任务。
通过沿边界法向移动潜在编码并重新评分语义变化（Delta s_i）来验证可操控的变异因素。
进行独立、联合和抖动式操控，以跨层和语义编辑场景。
将该方法应用于 StyleGAN、BigGAN、PGGAN，覆盖室内/室外场景；如所述使用 FID/LSUN/Places 数据；量化逐层专门化（布局在底部，颜色在顶部）。

实验结果

研究问题

RQ1在跨越多层抽象水平的场景合成中，GAN 会涌现出哪些语义因素？
RQ2这些语义因素在 StyleGAN/BigGAN/PGGAN 的生成层之间如何分布？
RQ3我们能否通过逐层潜在编码定量识别并操控涌现的变异因素？
RQ4层级潜在表示是否能在不同的 GAN 架构和场景类别之间泛化？

主要发现

在 GAN 表征中涌现出一个层次化的语义结构：早期层控制布局，中间层控制对象，后期层呈现属性和配色方案。
逐层潜在编码通过沿语义边界移动来实现可操控的场景编辑，产生多样且语义连贯的编辑结果。
中间层编码类别特定对象，能够实现类别转换（如从卧室到起居室），同时保留布局与高层属性。
重新评分技术通过测量潜在编码越过边界方向时的语义分数变化来对变异因素进行排序。
实验显示 StyleGAN、BigGAN、PGGAN 的层对语义映射具有一致性，并通过分类器和用户研究在层相关性方面给予定量验证。
表 1 报告多个场景类别的 Fréchet Inception Distance (FID) 值（如卧室 2.65；客厅 5.16；厨房 5.06；餐厅 4.03；桥梁 6.42；教堂 4.82；塔 5.99；混合 3.74）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。