QUICK REVIEW

[论文解读] NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion

Jiatao Gu, Alex Trevithick|arXiv (Cornell University)|Feb 20, 2023

Advanced Vision and Imaging被引用 37

一句话总结

NerfDiff 在相机空间的三平面 NeRF 与一个 3D 感知的扩散模型联合训练，并使用 NeRF 引导蒸馏来微调，以实现从单张图片生成多视角一致的新视图合成。

ABSTRACT

Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input. Existing approaches condition neural radiance fields (NeRF) on local image features, projecting points to the input image plane, and aggregating 2D features to perform volume rendering. However, under severe occlusion, this projection fails to resolve uncertainty, resulting in blurry renderings that lack details. In this work, we propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test time. We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and finetunes the NeRF based on the improved virtual views. Our approach significantly outperforms existing NeRF-based and geometry-free approaches on challenging datasets, including ShapeNet, ABO, and Clevr3D.

研究动机与目标

激发单图像的新视图合成，在遮挡条件下保持语义和物理一致性。
提出一个快速的、与相机对齐的三平面 NeRF 表示，受单输入图像条件约束。
集成一个 3D 感知的条件扩散模型，以解决遮挡区域后的不确定性。
引入 NeRF 指导蒸馏（NGD），在测试时联合精炼 NeRF 渲染并引导多视图扩散。
在 ShapeNet、ABO 和 Clevr3D 数据集上展示最先进的性能。

提出的方法

通过 UNet 编码器产生与图像对齐的三平面，提出一个受单输入图像条件约束的相机空间三平面 NeRF。
构建一个 3D 感知的条件扩散模型（CDM），使 NeRF 渲染朝向目标视图进行细化。
在多视图数据上联合训练 NeRF 和 CDM，使得在测试时能从输入图像初始化 NeRF。
在推理时通过用 CDM 生成虚拟视图并利用 NeRF 引导蒸馏（NGD）将 CDM 知识回蒸馏回 NeRF 来微调。
采用交替优化方案，其中 NeRF 蒸馏和扩散采样相互强化跨视图的 3D 一致性。

实验结果

研究问题

RQ1如何利用单张图像生成具有高保真度且跨多视图一致的新视图？
RQ2当在单张图像条件下对 NeRF 进行条件化时，3D 感知的扩散模型能否提供可靠、跨视图一致的先验来解决遮挡不确定性？
RQ3通过 NeRF 引导蒸馏在推理时微调是否比现有的单图 NeRF 或无几何方法带来更好的 3D 一致性和感知质量？
RQ4在标准基准测试中，三平面 NeRF + CDM 框架在速度和精度方面的权衡是什么？

主要发现

NerfDiff 在 ShapeNet Cars/Chairs 与 ABO 上，相对于无几何方法和单视图 NeRF 基线，达到最先进的 PSNR 和 SSIM。
结合 3D 感知的 CDM 显著提升感知质量（LPIPS）和 FID，尤其是在遮挡后方。
NGD 微调带来更清晰的渲染和更好的 FID/LPIPS，相比天真的 CDM 蒸馏或基于 SDS 的方法。
更大的 CDM/NeRF 模型尺寸（NerfDiff-L）提升感知质量，NGD 在 FID 和 LPIPS 上有显著提升。
消融研究显示 50 个虚拟视图在微调时在效率和性能之间取得良好平衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。