QUICK REVIEW

[论文解读] SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis

Guangcong Wang, Zhaoxi Chen|arXiv (Cornell University)|Mar 28, 2023

Advanced Vision and Imaging被引用 11

一句话总结

SparseNeRF 使用局部深度排序和从粗糙深度图提取的时空连续性蒸馏来提升少样本 NeRF，在 LLFF、DTU 上实现最先进结果，并在新 NVS-RGBD 数据集上无增加推理时间。

ABSTRACT

Neural Radiance Field (NeRF) significantly degrades when only a limited number of views are available. To complement the lack of 3D information, depth-based models, such as DSNeRF and MonoSDF, explicitly assume the availability of accurate depth maps of multiple views. They linearly scale the accurate depth maps as supervision to guide the predicted depth of few-shot NeRFs. However, accurate depth maps are difficult and expensive to capture due to wide-range depth distances in the wild. In this work, we present a new Sparse-view NeRF (SparseNeRF) framework that exploits depth priors from real-world inaccurate observations. The inaccurate depth observations are either from pre-trained depth models or coarse depth maps of consumer-level depth sensors. Since coarse depth maps are not strictly scaled to the ground-truth depth maps, we propose a simple yet effective constraint, a local depth ranking method, on NeRFs such that the expected depth ranking of the NeRF is consistent with that of the coarse depth maps in local patches. To preserve the spatial continuity of the estimated depth of NeRF, we further propose a spatial continuity constraint to encourage the consistency of the expected depth continuity of NeRF with coarse depth maps. Surprisingly, with simple depth ranking constraints, SparseNeRF outperforms all state-of-the-art few-shot NeRF methods (including depth-based models) on standard LLFF and DTU datasets. Moreover, we collect a new dataset NVS-RGBD that contains real-world depth maps from Azure Kinect, ZED 2, and iPhone 13 Pro. Extensive experiments on NVS-RGBD dataset also validate the superiority and generalizability of SparseNeRF. Code and dataset are available at https://sparsenerf.github.io/.

研究动机与目标

在缺乏密集多视图数据时，推动鲁棒的少样本新视图合成。
利用来自预训练深度模型或普通传感器的粗略深度先验，而非准确深度。
引入深度排序和空间连续性蒸馏以正则化 NeRF 训练。
展示这些先验在标准基准和一个新数据集上提升几何形状与渲染质量。

提出的方法

基础 NeRF 骨干网络（Mip-NeRF），使用颜色重建损失进行训练。
从预训练深度模型（如 DPT）或粗糙传感器深度提取的深度先验蒸馏。
局部深度排序蒸馏：通过排序损失（等式(3)）强制 NeRF 的深度排序在局部补丁内与粗糙深度的排序一致。
空间连续性蒸馏：强制 NeRF 的深度连续性模仿粗糙深度图的局部深度连续性（等式(4)）。
完整目标函数：L = L_nerf + lambda * R_rank + gamma * R_conti，具有预设的边界和权重。

实验结果

研究问题

RQ1在不依赖准确深度监督的前提下，来自粗糙深度图的鲁棒深度先验是否能改善少样本 NeRF？
RQ2仅使用局部深度排序在引导 NeRF 深度训练时是否就能超过深度缩放？
RQ3引入空间连续性蒸馏是否能提高跨视图的几何一致性？
RQ4在 LLFF、DTU 和新的 NVS-RGBD 数据集上，以及使用不同的预训练深度模型时，这些先验的表现如何？

主要发现

SparseNeRF 在 LLFF 和 DTU 的少样本 NeRF 方法中，在 PSNR、SSIM 和 LPIPS 指标上达到最先进水平。
在三视图的 LLFF 上，SparseNeRF 获得 PSNR 19.86，SSIM 0.624，LPIPS 0.328（相比 RegNeRF 的 19.08/0.587/0.336）。
在三视图的 DTU 上，SparseNeRF 获得 PSNR 19.55，SSIM 0.769，LPIPS 0.201（对比 RegNeRF 的 18.89/0.745/0.190）。
在新的 NVS-RGBD 数据集上，SparseNeRF 在 Kinect 和 ZED 2 传感器上的表现优于 RegNeRF、DSNeRF 和 MonoSDF（更高的 PSNR、0.80 以上的 SSIM、较低的 LPIPS 和深度误差）。
深度排序蒸馏和空间连续性蒸馏有助于改善几何和三维一致性，相较基线（消融研究在不使用排序或连续性时出现下降）。
使用不同的预训练深度模型（MiDaS、DPT Hybrid/Large）在基线之上持续提高结果，其中 DPT 变体表现最佳。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。