QUICK REVIEW

[论文解读] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Ben Mildenhall, Pratul P. Srinivasan|arXiv (Cornell University)|Mar 19, 2020

Advanced Vision and Imaging参考文献 51被引用 523

一句话总结

论文介绍了 NeRF，一种使用 MLP 将场景表示为连续的 5D 神经辐射场，并通过可微分体积渲染合成新视图，在来自稀疏输入图像集合的情况下达到最先进的结果。

ABSTRACT

We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location $(x,y,z)$ and viewing direction $(θ, ϕ)$) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.

研究动机与目标

提出一个连续的 5D 场景表示 (x,y,z, theta, phi)，能够建模复杂几何和视角相关外观。
使用全连接神经网络将 5D 坐标映射到体积密度和发射辐射。
开发基于体积渲染的可微分渲染管线，以从 RGB 图像优化神经辐射场。
通过位置编码和分层采样提高效率并捕捉高频细节，以渲染高分辨率视图。

提出的方法

将场景表示为 5D 函数 FΘ(x,y,z,θ,φ) → (c, σ)，其中 c 是 RGB 颜色，σ 是体积密度。
使用不含卷积的 MLP 将 (x,y,z) 处理为密度 σ 和一个 256 维特征，然后与观测方向 (θ,φ) 结合以输出视角相关的颜色。
通过投射光线、沿光线对 3D 点取样、对 MLP 进行查询，并应用基于求积的可微分体积渲染来渲染图像。
对每条光线应用分层采样以估计颜色，使用 Ĉ(r) = Σi Ti(1−exp(−σήiδi))ci，其中 Ti 为透射率，实现可微分性。
引入位置编码 γ(p)，将输入映射到更高维空间以捕捉高频内容。
采用两阶段分层采样，配备粗网络和细网络，以在可能可见的区域分配样本，从而提高效率。

实验结果

研究问题

RQ1一个连续的 5D 神经辐射场是否能够从稀疏的一组 RGB 视图中建模复杂几何和视角相关外观？
RQ2对神经辐射场进行的可微分体积渲染是否能产生高保真光照的新视图，优于此前的神经渲染方法？
RQ3位置编码和分层采样是否在 NeRF 中实现稳定优化并捕捉高频细节？
RQ4在合成和真实世界数据上，NeRF 与现有的神经或体素基视图合成方法相比如何？

主要发现

NeRF 在合成和真实数据集上用于新视图合成优于先前工作。
位置编码与分层采样的结合对高频几何和外观重建至关重要。
NeRF 仅需要具有已知位姿的 RGB 图像即可优化，避免对显式 3D 几何监督。
相较于基线，NeRF 提供更高保真度的渲染、更好的多视图一致性和更少的伪影。
该方法能够从相对稀疏的输入视图中渲染出高分辨率的真实感视图。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。