QUICK REVIEW

[论文解读] Enhancing Novel View Synthesis via Geometry Grounded Set Diffusion

Farhad Ghazvinian Zanjani, Hong Cai|arXiv (Cornell University)|Jan 12, 2026

Advanced Vision and Imaging被引用 0

一句话总结

ViewMorpher3D 通过基于扩散的修复，在受 3D 几何先验、相机位姿和多视参考条件约束下，提升跨视图和时序一致性的 3D 高斯喷溅场场景中的多视图新视图合成。

ABSTRACT

We present SetDiff, a geometry-grounded multi-view diffusion framework that enhances novel-view renderings produced by 3D Gaussian Splatting. Our method integrates explicit 3D priors, pixel-aligned coordinate maps and pose-aware Plucker ray embeddings, into a set-based diffusion model capable of jointly processing variable numbers of reference and target views. This formulation enables robust occlusion handling, reduces hallucinations under low-signal conditions, and improves photometric fidelity in visual content restoration. A unified set mixer performs global token-level attention across all input views, supporting scalable multi-camera enhancement while maintaining computational efficiency through latent-space supervision and selective decoding. Extensive experiments on EUVS, Para-Lane, nuScenes, and DL3DV demonstrate significant gains in perceptual fidelity, structural similarity, and robustness under severe extrapolation. SetDiff establishes a state-of-the-art diffusion-based solution for realistic and reliable novel-view synthesis in autonomous driving scenarios.

研究动机与目标

在稀疏观测和大基线下，动机化自动驾驶中对鲁棒多视图新视图合成（NVS）的需求。
开发一个几何感知且可扩展到可变数量摄像机与时间步的扩散增强器。
引入超越 RGB 渲染的几何条件信号，以提升结构保真度和多视图一致性。
实现潜在空间监督并通过有选择的像素空间监督在效率与跨视图耦合之间取得平衡。
在具有挑战性的驾驶数据集上，展示比状态最先进基线更高的图像质量和几何可信度。

提出的方法

提出一个几何地基的扩散增强器（ViewMorpher3D），可联合处理可变卡数的参考视图和目标视图。
在扩散去噪器中除了 RGB 输入外，还对几何信号（C 地图）和姿态嵌入（Plücker 光线场）进行条件化。
使用潜在空间扩散框架（SD-Turbo），通过一个学习编码器 Psi 将 C 地图、Plücker 嵌入和视图掩码融合，建立一个二维 UNet 的条件模型。
在所有视图上应用全 3D 自注意力，以在修复阶段强化跨视图的空间一致性。
对所有目标采用潜在空间监督，并通过有选择的像素空间监督来管理内存并保持跨视图一致性。
通过 LoRA 微调 VAE 解码器，弥合领域差异并提升重建保真度。

Figure 1 : ViewMorpher3D improves rendered novel views via multi-view diffusion, conditioned on camera images, poses, and the scene’s 3D structure.

实验结果

研究问题

RQ1扩散式增强器是否能够利用几何感知条件来提升 3D 高斯喷溅场场景中的多视图 NVS？
RQ2多视图（以及时间）条件化如何影响增强视图的跨视图一致性和时序连贯性？
RQ3参考视图和目标视图数量的变化对增强质量有何影响？
RQ4在外推驾驶场景中，与仅 RGB 的扩散增强器相比，几何地基条件是否能降低伪影和幻影？
RQ5在具有挑战性的驾驶数据集上，ViewMorpher3D 相对于最先进的基于扩散的增强器的表现如何？

主要发现

ViewMorpher3D 在外推和稀疏设置中，感知和结构质量均高于仅 RGB 的增强器如 DiFix3D+ 和 3DGS-Enhancer。
模型从多参考视图受益，参考视图数量增加时质量提升明显。
由于具有置换不变的融合和对多视图特征的全 3D 自注意力，跨视图与时序一致性得到提升。
带有 C 地图和 Plücker 嵌入的几何条件化降低了幻觉并在具有挑战性的视点下保留场景几何。
通过对所有目标使用潜在空间监督并结合选择性像素空间损失，能够在不产生高内存成本的前提下实现可扩展的多目标增强。
在 EUVS、Para-Lane 和 nuScenes 的评测中，ViewMorpher3D 在 PSNR、SSIM 和 LPIPS 指标上相对于基线有显著提升。

Figure 2 : Overview illustration of ViewMorpher3D. The rendered novel-view images are enhanced via a multi-view diffusion model, conditioned on reference views, camera poses and 3D priors.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。