[论文解读] Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision
本文提出了一种 inverse-graphics 扩散框架,在没有直接监督的情况下利用前向模型和部分观测来实现三维一致的重建和修补。
Denoising diffusion models are a powerful type of generative models used to capture complex distributions of real-world signals. However, their applicability is limited to scenarios where training samples are readily available, which is not always the case in real-world applications. For example, in inverse graphics, the goal is to generate samples from a distribution of 3D scenes that align with a given image, but ground-truth 3D scenes are unavailable and only 2D images are accessible. To address this limitation, we propose a novel class of denoising diffusion probabilistic models that learn to sample from distributions of signals that are never directly observed. Instead, these signals are measured indirectly through a known differentiable forward model, which produces partial observations of the unknown signal. Our approach involves integrating the forward model directly into the denoising process. This integration effectively connects the generative modeling of observations with the generative modeling of the underlying signals, allowing for end-to-end training of a conditional generative model over signals. During inference, our approach enables sampling from the distribution of underlying signals that are consistent with a given partial observation. We demonstrate the effectiveness of our method on three challenging computer vision tasks. For instance, in the context of inverse graphics, our model enables direct sampling from the distribution of 3D scenes that align with a single 2D input image.
研究动机与目标
- 在三维场景中提出在没有直接监督的情况下求解随机逆问题的动机。
- 提出一个基于扩散的 inverse-graphics 流程,结合前向模型。
- 展示在训练与推理阶段对相机姿态噪声和部分观测的鲁棒性。
- 在 Co3D 与 Objaverse 等数据集上演示三维一致的重建与修补。
提出的方法
- 使用以前向图形模型为条件的扩散过程,从二维观测推断三维结构。
- 融合前向模型先验,以在没有直接监督的情况下实现 inverse graphics。
- 通过带噪声的相机姿态进行训练,以提高三维重建的鲁棒性。
- 通过设计模型完成缺失的图像patch,实现部分观测的修补。
- 生成一个更新后的 inverse-graphics 工作流的管线图。
- 在 Co3D 与 Objaverse 数据集上与基线方法进行对比。
实验结果
研究问题
- RQ1基于扩散的 inverse-graphics 是否能够在没有直接监督的情况下解决随机逆问题?
- RQ2在姿态噪声下,前向模型如何提高三维一致性与重建质量?
- RQ3模型能否从部分观测中实现可靠的修补?
- RQ4使用带噪声姿态进行训练对渲染质量和三维一致性有何影响?
- RQ5在标准三维数据集上,所提出的方法与现有基线相比有何差异?
主要发现
| 方法 | PSNR | LPIPS | FID |
|---|---|---|---|
| PixelNeRF | 17.96 | 0.479 | 158.50 |
| SparseFusion | 11.76 | 0.770 | 257.63 |
| Ours | 17.62 | 0.368 | 66.81 |
| With noise (ablation) | 17.24 | 0.40 | 92.23 |
| Ours (ablation) | 18.19 | 0.34 | 56.64 |
| Deterministic (2D inpainting) | 21.35 | 0.11 | 9.18 |
| Ours (2D inpainting) | 20.18 | 0.09 | 4.25 |
- 所提出的方法在 Co3D(10 个类别)上实现了有竞争力的 PSNR/LPIPS/FID,PSNR 17.62,LPIPS 0.368,FID 66.81,相较于 PixelNeRF 和 SparseFusion。
- 去除姿态噪声的消融显示方法保持鲁棒,带噪声训练时的 PSNR 17.24,LPIPS 0.40,FID 92.23,与无噪声消融的 17.62/0.368/66.81 相比。
- 在 2D 修补中,该方法达到 PSNR 20.18,LPIPS 0.09,FID 4.25,据表格所示,优于具有 PSNR 21.35、LPIPS 0.11、FID 9.18 的确定性基线。
- 定性结果表明通过提取的点云实现三维一致的重建,并在修补方面优于基线。
- 更新后的管线强调带前向模型的 inverse graphics,以便从部分观测中学习(论文中引用的图)。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。