QUICK REVIEW

[论文解读] Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Xiaoxiao Long, Yuan-Chen Guo|arXiv (Cornell University)|Oct 23, 2023

Advanced Vision and Imaging被引用 24

一句话总结

Wonder3D 通过跨域扩散模型生成多视角法线和颜色并融合成3D表面，从单张图像重建高保真纹理网格。

ABSTRACT

In this work, we introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images.Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry from 2D diffusion priors, but they typically suffer from time-consuming per-shape optimization and inconsistent geometry. In contrast, certain works directly produce 3D information via fast network inferences, but their results are often of low quality and lack geometric details. To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images. To ensure consistency, we employ a multi-view cross-domain attention mechanism that facilitates information exchange across views and modalities. Lastly, we introduce a geometry-aware normal fusion algorithm that extracts high-quality surfaces from the multi-view 2D representations. Our extensive evaluations demonstrate that our method achieves high-quality reconstruction results, robust generalization, and reasonably good efficiency compared to prior works.

研究动机与目标

通过利用扩散先验来解决困难的单视图3D重建问题。
提高跨视图和跨域（法线与颜色）的一致性，以获得连贯的3D表面。
在比 SDS-based 方法更高效的推理下实现高质量几何和纹理。
探索一个可零-shot泛化到多样形状和风格的扩散框架。

提出的方法

提出一个多视图跨域扩散模型，从单个输入图像生成一致的法线贴图和颜色图像。
引入一个域切换器，在不重新训练基础先验的情况下，使扩散模型对法线与颜色进行条件化。
引入跨域注意力，在法线域与颜色域之间交换信息，以实现几何-视觉的一致性。
开发几何感知的法线融合算法，从生成的多视图表示中提取高质量表面。
利用与预训练二维先验（Stable Diffusion）兼容的扩散框架，实现高效的零-shot泛化。

实验结果

研究问题

RQ1跨域扩散模型如何从单张图像联合生成多视图法线和颜色？
RQ2跨域注意力是否能提升跨视图生成的法线与颜色之间的一致性？
RQ3几何感知的法线融合在从二维法线和图像中重建高质量三维表面方面的效果如何？
RQ4相较于 SDS-based 三维重建方法，效率与泛化之间的权衡是什么？

主要发现

Method	Chamfer Dist. ↓	Volume IoU ↑
RealFusion	0.0819	0.2741
Magic123	0.0516	0.4528
One-2-3-45	0.0629	0.4086
Point-E	0.0426	0.2875
Shap-E	0.0436	0.3584
Zero123	0.0339	0.5035
SyncDreamer	0.0261	0.5421
Ours	0.0199	0.6244

在 GSO 数据集（表 1）上，我们的方法在所测试的单视图重建方法中获得最高的几何和纹理质量。
与基线相比，我们的方法在新视图合成指标（PSNR、SSIM、LPIPS）上有显著提升（表 2）。
使用带跨域注意力的跨域扩散在多视图一致性方面优于顺序或非注意力变体。
几何感知的法线损失和离群点抛弃策略带来更干净的表面和更好细节保留。
该方法在 2 分钟内重建纹理网格，优于耗时的逐形状 SDS 优化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。