QUICK REVIEW

[论文解读] GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis

Katja Schwarz, Yiyi Liao|arXiv (Cornell University)|Jul 5, 2020

Generative Adversarial Networks and Image Synthesis参考文献 81被引用 322

一句话总结

GRAF 学习一个条件神经辐射场，用以从未定位的二维图像合成高分辨率、3D 一致的图像，使用多尺度基于补丁的判别器，实现在形状、外观和视角上的控制。

ABSTRACT

While 2D generative adversarial networks have enabled high-resolution image synthesis, they largely lack an understanding of the 3D world and the image formation process. Thus, they do not provide precise control over camera viewpoint or object pose. To address this problem, several recent approaches leverage intermediate voxel-based representations in combination with differentiable rendering. However, existing methods either produce low image resolution or fall short in disentangling camera and scene properties, e.g., the object identity may vary with the viewpoint. In this paper, we propose a generative model for radiance fields which have recently proven successful for novel view synthesis of a single scene. In contrast to voxel-based representations, radiance fields are not confined to a coarse discretization of the 3D space, yet allow for disentangling camera and scene properties while degrading gracefully in the presence of reconstruction ambiguity. By introducing a multi-scale patch-based discriminator, we demonstrate synthesis of high-resolution images while training our model from unposed 2D images alone. We systematically analyze our approach on several challenging synthetic and real-world datasets. Our experiments reveal that radiance fields are a powerful representation for generative image synthesis, leading to 3D consistent models that render with high fidelity.

研究动机与目标

Address lack of 3D understanding in 2D GANs and enable explicit control over camera viewpoint and object pose.
Develop a generative radiance field model that can be trained from unposed 2D images to synthesize novel 3D-consistent scenes.
Disentangle shape, appearance, and viewpoint to allow independent manipulation of these factors.
Achieve high-resolution image synthesis by introducing a multi-scale patch-based discriminator.
Evaluate on synthetic and real datasets to demonstrate 3D consistency and image fidelity.

提出的方法

Represent scenes as conditional radiance fields g_theta that map 3D location x, viewing direction d, shape code z_s, and appearance code z_a to color c and density sigma.
Use positional encoding for x and d and separate encoders heads for density (sigma) and color (c); color head conditions on d and z_a for view-dependent appearance.
Render 2D images via differentiable volume rendering with alpha compositing along rays.
Train with a GAN objective using a patch-based discriminator that samples random KxK patches at various scales to supervise the generator.
Condition the radiance field on latent codes z_s (shape) and z_a (appearance) to enable disentanglement and controllable manipulation of geometry and texture.
Sample random camera poses xi and random 2D patch patterns nu during training to promote view diversity and resolution-agnostic supervision.

实验结果

研究问题

RQ1Can a generative radiance field learned from unposed 2D images produce high-fidelity, 3D-consistent images at high resolutions?
RQ2Does disentangling shape and appearance via latent codes lead to controllable 3D-aware generation and reliable view-consistency?
RQ3Is a multi-scale patch-based discriminator essential for stable, high-resolution 3D-aware image synthesis?
RQ4How does GRAF compare to voxel-based 3D-aware methods and to 2D GANs in terms of image fidelity and 3D consistency?

主要发现

GRAF achieves high-fidelity, high-resolution 3D-aware image synthesis from unposed images, with improved 3D consistency over voxel-based baselines.
The conditional radiance field successfully disentangles shape (z_s) from appearance (z_a), enabling independent manipulation of geometry and texture during inference.
A multi-scale patch-based discriminator is crucial for stable GAN training and high-quality outputs across datasets and resolutions.
Experiments show favorable FID/KID and 3D reconstruction metrics compared to baselines like platonic GAN and HoloGAN, particularly on datasets with substantial viewpoint variation.
The approach generalizes to higher resolutions, with evidence that learned radiance fields render from arbitrary viewpoints while maintaining multi-view consistency.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。