QUICK REVIEW

[论文解读] Novel View Synthesis with Diffusion Models

Daniel Watson, William Chan|arXiv (Cornell University)|Oct 6, 2022

Advanced Vision and Imaging被引用 63

一句话总结

3DiM 是一种几何无关的扩散模型，使用随机条件和共享 X-UNet 架构从单张图像合成多视角一致的3D新视图，且无需测试时优化。

ABSTRACT

We present 3DiM, a diffusion model for 3D novel view synthesis, which is able to translate a single input view into consistent and sharp completions across many views. The core component of 3DiM is a pose-conditional image-to-image diffusion model, which takes a source view and its pose as inputs, and generates a novel view for a target pose as output. 3DiM can generate multiple views that are 3D consistent using a novel technique called stochastic conditioning. The output views are generated autoregressively, and during the generation of each novel view, one selects a random conditioning view from the set of available views at each denoising step. We demonstrate that stochastic conditioning significantly improves the 3D consistency of a naive sampler for an image-to-image diffusion model, which involves conditioning on a single fixed view. We compare 3DiM to prior work on the SRN ShapeNet dataset, demonstrating that 3DiM's generated completions from a single view achieve much higher fidelity, while being approximately 3D consistent. We also introduce a new evaluation methodology, 3D consistency scoring, to measure the 3D consistency of a generated object by training a neural field on the model's output views. 3DiM is geometry free, does not rely on hyper-networks or test-time optimization for novel view synthesis, and allows a single model to easily scale to a large number of scenes.

研究动机与目标

在仅有少量输入视图时激发新视图合成的动机，并强调生成未见视图的模糊性。
开发一种几何无关的端到端扩散模型，能够从单个或少量输入视图生成多个3D一致的视图。
引入机制以在不依赖显式3D表示或测试时优化的情况下促进3D一致性。
提供一种新的几何无关视图合成评估方案，通过对生成视图进行神经场训练来量化3D一致性。

提出的方法

提出 3DiM，一个条件姿态的图像到图像扩散模型，学习在源视图及其姿态条件下生成目标视图。
引入随机条件以在每个去噪步骤随机选择一个条件视图，进行自回归生成多视图，促进3D一致性。
开发 X-UNet，一种几何感知的 UNet 变体，在输入帧之间权重共享并通过跨注意力融合条件视图和目标视图。
在同一场景的视图对上进行训练，不需要显式3D表示或测试时优化。
与 ShapeNet 基于 SRN 任务的先前几何感知和几何无关方法进行比较，使用标准指标（PSNR、SSIM、FID）以及新提出的3D一致性评估。

实验结果

研究问题

RQ1扩散模型如何被改造以从有限的输入视图执行新视图合成？
RQ2几何无关的扩散模型是否能够在不进行每个场景优化或显式3D表示的情况下生成多组3D一致的视图？
RQ3哪些架构选择和采样策略（如随机条件）能提高生成视图的3D一致性和视觉保真度？
RQ4除了传统的图像质量指标，我们应该如何评估几何无关视图合成的3D一致性？

主要发现

3DiM 从单个输入视图生成清晰、可信的新视图，并在3D一致性方面与先前方法相比实现了近似一致性。
随机条件在扩散采样中显著提高了3D一致性，相较于直观、固定视图条件。
具有权重共享和跨注意力的 X-UNet 架构在3D一致性和对条件视图的对齐方面优于 Concat-UNet。
标准指标（PSNR、SSIM）可能不足以充分反映几何无关模型的样本质量，而 FID 和提出的3D一致性评估更可靠地捕捉模型性能。
基于在模型输出上训练神经场的专门3D一致性评分方法，惩罚不一致的输出并与定性评估保持一致。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。