QUICK REVIEW

[论文解读] GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations

Martin Engelcke, Adam R. Kosiorek|arXiv (Cornell University)|Jul 30, 2019

Generative Adversarial Networks and Image Synthesis参考文献 51被引用 73

一句话总结

GENESIS 是一个面向对象的生成模型，用于渲染的3D场景，能够将场景分解为对象并使用对象组件的自回归先验生成连贯的新场景。它在场景生成和分解方面优于现有方法，同时实现组件之间的关系推理.

ABSTRACT

Generative latent-variable models are emerging as promising tools in robotics and reinforcement learning. Yet, even though tasks in these domains typically involve distinct objects, most state-of-the-art generative models do not explicitly capture the compositional nature of visual scenes. Two recent exceptions, MONet and IODINE, decompose scenes into objects in an unsupervised fashion. Their underlying generative processes, however, do not account for component interactions. Hence, neither of them allows for principled sampling of novel scenes. Here we present GENESIS, the first object-centric generative model of 3D visual scenes capable of both decomposing and generating scenes by capturing relationships between scene components. GENESIS parameterises a spatial GMM over images which is decoded from a set of object-centric latent variables that are either inferred sequentially in an amortised fashion or sampled from an autoregressive prior. We train GENESIS on several publicly available datasets and evaluate its performance on scene generation, decomposition, and semi-supervised learning.

研究动机与目标

促使学习紧凑、可组合的视觉场景表示，以提升机器人技术和强化学习中的感知与规划。
开发一个无监督模型，具备显式的面向对象潜变量，能够分解和生成场景。
通过自回归先验捕捉场景组件之间的相互作用，以实现对新场景的连贯采样。
通过在低维潜空间中对组件进行推断，实现可扩展、并行化的推断。

提出的方法

对图像使用空间高斯混合模型，其中每个分量代表一个场景元素。
对每个分量的掩码编码使用RNN的自回归先验以捕捉组件之间的空间关系。
将组件潜变量分解，使得每个 z^c_k 依赖于 z^m_k；将图像渲染为 π_k p_θ(x|z^c_k) 的和。
使用与生成结构镜像的 amortized 推断 q_φ(z^m,z^c|x) 进行训练；对 π_k 使用棒状分解 (SBP) 先验，或作为替代的 softmax 归一化。
应用 Generalised ELBO with Constrained Optimisation (GECO) 以在重建质量和 KL 正则化之间取得平衡。
提供两个变体：genesis（分离的掩码和组件潜变量 z^m, z^c）和 genesis-s（每个组件只有一个潜变量）

实验结果

研究问题

RQ1一个面向对象的生成模型能在无监督的情况下分解复杂场景并生成连贯的新场景吗？
RQ2对场景组件的自回归先验是否能提高生成场景的连贯性并实现组件之间的关系推理？
RQ3Genesis 学到的面向对象表示如何迁移到下游任务，如推理场景状态（如稳定性）或预测视角？

主要发现

Genesis 能实现具有连贯性的逐组件场景生成，且符合空间布局（先生成地面/天空，然后是对象，最后是背景墙）。
在 GQN 上，Genesis 相较于 MONet 提供更出色的逐组件生成，产生语义一致的场景。
Genesis 在 ShapeStacks 上对比 MONet 的无监督分割指标具有竞争力甚至优于它（ARI 0.73±0.03；SC 0.64±0.08；mSC 0.60±0.09）。
Genesis 学到的表征提升下游任务性能，如预测塔的稳定性和高度，在 ShapeStacks 任务中优于若干基线（如 bd-vae、dc-vae）。
Fréchet Inception Distances 显示 Genesis 的变体在 Multi-dSprites 和 GQN 上达到最佳或具竞争力的样本质量（例如 Multi-dSprites: 24.9/28.2；GQN: 80.5/70.2）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。