QUICK REVIEW

[论文解读] Visual Object Networks: Image Generation with Disentangled 3D Representation

Jun-Yan Zhu, Zhoutong Zhang|arXiv (Cornell University)|Dec 6, 2018

3D Shape Modeling and Analysis参考文献 7被引用 58

一句话总结

VON 将图像合成分解为解耦的形状、视点和纹理因子，从 3D先验生成逼真的 2D 图像，实现不需要配对的 2D-3D 数据的 3D 感知编辑和视点变化。

ABSTRACT

Recent progress in deep generative models has led to tremendous breakthroughs in image generation. However, while existing models can synthesize photorealistic images, they lack an understanding of our underlying 3D world. We present a new generative model, Visual Object Networks (VON), synthesizing natural images of objects with a disentangled 3D representation. Inspired by classic graphics rendering pipelines, we unravel our image formation process into three conditionally independent factors---shape, viewpoint, and texture---and present an end-to-end adversarial learning framework that jointly models 3D shapes and 2D images. Our model first learns to synthesize 3D shapes that are indistinguishable from real shapes. It then renders the object's 2.5D sketches (i.e., silhouette and depth map) from its shape under a sampled viewpoint. Finally, it learns to add realistic texture to these 2.5D sketches to generate natural images. The VON not only generates images that are more realistic than state-of-the-art 2D image synthesis methods, but also enables many 3D operations such as changing the viewpoint of a generated image, editing of shape and texture, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.

研究动机与目标

以解耦的生成模型为驱动，推动将三维结构与二维图像合成桥接起来。
开发一个 3D 形状先验和一个可微分的 2.5D 投影，将 3D 体素连接到 2D 草图。
训练纹理生成网络，以 2.5D 草图生成照片级真实图像。
实现包括视点变化、形状/纹理编辑和纹理传输在内的 3D 操作。
证明解耦的 3D 表征在真实感方面优于仅 2D 的 GAN 基线。

提出的方法

利用 3D-GAN 和 Wasserstein-GP 学习一个类别特定的 3D 形状先验，以生成体素网格。
通过从采样视点的投影模块计算可微分的 2.5D 草图（轮廓和深度）。
使用非配对图像数据与循环一致性对抗损失来训练纹理网络，以从 2.5D 草图渲染逼真图像。
使用编码器从真实图像恢复纹理和 2.5D 草图，并应用循环一致性和 KL 损失以鼓励一对多映射。
端到端训练，将形状、视图和纹理通过可微分组件连接成最终图像。
使用 Fréchet Inception Distance (FID) 与 2D GAN 进行对比评估，并进行人类偏好研究。

实验结果

研究问题

RQ1解耦的 3D 表征是否能在生成图像的真实感上超越 2D GAN？
RQ2支持视点和纹理/形状编辑的 3D 感知工作流程是否在近似真实图像的数据上优于仅 2D 的合成？
RQ3能否有效利用非配对的 2D 和 3D 数据来训练一个联合生成模型？
RQ4当形状、视点和纹理可以独立控制时，能实现哪些 3D 操作？

主要发现

VON 在 car 和 chair 数据集上实现低于 DCGAN、LSGAN 和 WGAN-GP 基线的 Fréchet Inception Distance (FID)。
在人类偏好比较的大多数场景中，偏好 VON 生成的图像胜过基线 2D GAN。
VON 产出高质量的 3D 形状，并促进包括视点变化、形状/纹理编辑和纹理传输等的 3D 感知操作。
使用 3D 形状先验比先前的 3D-GAN 方法在样本真实感方面有所提升。
形状的距离函数（DF）表示在 FID 上提供了与体素表示相当或更优的表现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。