[论文解读] Novel View Synthesis with Diffusion Models
3DiM 是一种几何无关的扩散模型,使用随机条件和共享 X-UNet 架构从单张图像合成多视角一致的3D新视图,且无需测试时优化。
We present 3DiM, a diffusion model for 3D novel view synthesis, which is able to translate a single input view into consistent and sharp completions across many views. The core component of 3DiM is a pose-conditional image-to-image diffusion model, which takes a source view and its pose as inputs, and generates a novel view for a target pose as output. 3DiM can generate multiple views that are 3D consistent using a novel technique called stochastic conditioning. The output views are generated autoregressively, and during the generation of each novel view, one selects a random conditioning view from the set of available views at each denoising step. We demonstrate that stochastic conditioning significantly improves the 3D consistency of a naive sampler for an image-to-image diffusion model, which involves conditioning on a single fixed view. We compare 3DiM to prior work on the SRN ShapeNet dataset, demonstrating that 3DiM's generated completions from a single view achieve much higher fidelity, while being approximately 3D consistent. We also introduce a new evaluation methodology, 3D consistency scoring, to measure the 3D consistency of a generated object by training a neural field on the model's output views. 3DiM is geometry free, does not rely on hyper-networks or test-time optimization for novel view synthesis, and allows a single model to easily scale to a large number of scenes.
研究动机与目标
- 在仅有少量输入视图时激发新视图合成的动机,并强调生成未见视图的模糊性。
- 开发一种几何无关的端到端扩散模型,能够从单个或少量输入视图生成多个3D一致的视图。
- 引入机制以在不依赖显式3D表示或测试时优化的情况下促进3D一致性。
- 提供一种新的几何无关视图合成评估方案,通过对生成视图进行神经场训练来量化3D一致性。
提出的方法
- 提出 3DiM,一个条件姿态的图像到图像扩散模型,学习在源视图及其姿态条件下生成目标视图。
- 引入随机条件以在每个去噪步骤随机选择一个条件视图,进行自回归生成多视图,促进3D一致性。
- 开发 X-UNet,一种几何感知的 UNet 变体,在输入帧之间权重共享并通过跨注意力融合条件视图和目标视图。
- 在同一场景的视图对上进行训练,不需要显式3D表示或测试时优化。
- 与 ShapeNet 基于 SRN 任务的先前几何感知和几何无关方法进行比较,使用标准指标(PSNR、SSIM、FID)以及新提出的3D一致性评估。
实验结果
研究问题
- RQ1扩散模型如何被改造以从有限的输入视图执行新视图合成?
- RQ2几何无关的扩散模型是否能够在不进行每个场景优化或显式3D表示的情况下生成多组3D一致的视图?
- RQ3哪些架构选择和采样策略(如随机条件)能提高生成视图的3D一致性和视觉保真度?
- RQ4除了传统的图像质量指标,我们应该如何评估几何无关视图合成的3D一致性?
主要发现
- 3DiM 从单个输入视图生成清晰、可信的新视图,并在3D一致性方面与先前方法相比实现了近似一致性。
- 随机条件在扩散采样中显著提高了3D一致性,相较于直观、固定视图条件。
- 具有权重共享和跨注意力的 X-UNet 架构在3D一致性和对条件视图的对齐方面优于 Concat-UNet。
- 标准指标(PSNR、SSIM)可能不足以充分反映几何无关模型的样本质量,而 FID 和提出的3D一致性评估更可靠地捕捉模型性能。
- 基于在模型输出上训练神经场的专门3D一致性评分方法,惩罚不一致的输出并与定性评估保持一致。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。