QUICK REVIEW

[论文解读] IBRNet: Learning Multi-View Image-Based Rendering

Qianqian Wang, Zhicheng Wang|arXiv (Cornell University)|Feb 25, 2021

Advanced Vision and Imaging参考文献 60被引用 27

一句话总结

IBRNet 学会一个通用的视图插值函数，利用多组附近的源视图，在不进行每个场景优化的情况下渲染高分辨率的新视图，并且可以对每个场景进行微调以匹配单场景神经渲染方法。

ABSTRACT

We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multi-view posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods. Project page: https://ibrnet.github.io/

研究动机与目标

在不进行场景特定优化的前提下，推动使用一组近邻的稀疏视图进行新视图合成。
开发一个轻量级、可泛化的网络（IBRNet），从多个视图在连续的五维位置预测颜色和密度。
通过光线变换器在光线上引入长程上下文，以提高密度估计和渲染精度。
利用经典体渲染，以多视图姿态图像作为监督，进行端到端训练。
证明预训练的 IBRNet 能泛化到未见场景，并且可以对每个场景进行微调，以接近单场景神经渲染的性能。

提出的方法

采用模块化流水线，选择一小组附近的源视图作为工作集，并从每个图像中提取密集特征。
对于光线上的每个三维点，聚合多视图特征，使用类似 PointNet 的池化计算密度特征，并通过光线变换器预测密度。
通过考虑视角方向的混合权重混合源视图颜色，以获得每个采样点的颜色，然后通过体积渲染进行渲染。
IBRNet 在连续的五维位置（3D 位置，2D 观察方向）上工作，并且可微分，使得能够进行带有多视图监督的端到端训练。
采用类似 NeRF 的分层采样（粗采样和细采样）以渲染高质量的新视图。

Figure 1: System Overview . 1) To render a novel target view (shown here as the image labeled with a ‘?’), we first identify a set of neighboring source views (e.g., the views labeled A and B ) and extract their image features. 2) Then, for each ray in the target view, we compute colors and densitie

实验结果

研究问题

RQ1一个通用、与场景无关的视图插值函数是否能够从稀疏的源视图合成高质量的新视图？
RQ2在光线上加入光线变换器以传播上下文是否能改善密度估计和渲染质量？
RQ3对预训练模型执行每场景微调在性能上相较于像 NeRF 这样的单场景神经渲染方法有何影响？

主要发现

方法	Diffuse Synthetic 360° PSNR	Diffuse Synthetic 360° SSIM	Diffuse Synthetic 360° LPIPS	Realistic Synthetic 360° PSNR	Realistic Synthetic 360° SSIM	Realistic Synthetic 360° LPIPS	Real Forward-Facing PSNR	Real Forward-Facing SSIM	Real Forward-Facing LPIPS
LLFF No per-scene optimization	34.38	0.985	0.048	24.88	0.911	0.114	24.13	0.798	0.212
Ours (no ft)	37.17	0.990	0.017	25.49	0.916	0.100	25.13	0.817	0.205
SRN Per-scene optimization	33.20	0.963	0.073	22.26	0.846	0.170	22.84	0.668	0.378
NV	29.62	0.929	0.099	26.05	0.893	0.160	-	-	-
NeRF	40.15	0.991	0.023	31.01	0.947	0.081	26.50	0.811	0.250
Ours_ft	42.93	0.997	0.009	28.14	0.942	0.072	26.73	0.851	0.175

预训练的 IBRNet 能泛化到未见场景，在所有评估数据集上都优于 LLFF。
在每个场景的微调在多个数据集上达到与 NeRF 竞争的性能，特别是在 Real Forward-Facing 数据上。
消融实验表明光线变换器是必要的，视图方向输入有助于提升质量，但并非唯一原因。
在一次性泛化设置下，IBRNet 的 PSNR/SSIM 较高，LPIPS 较低，相较于若干基线方法。
推理效率随源视图数量增加而增加，并且由于局部的基于视图的插值，每像素所需的 FLOPs 明显少于 NeRF。

Figure 2: IBRNet for volume density and color prediction at a continuous 5D location $(\mathbf{x},\mathbf{d})$ . We first input the 2D image features $\{\mathbf{f}_{i}\}_{i=1}^{N}$ extracted from all source views to a PointNet-like MLP to aggregate local and global information, resulting in multi-vi

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。