QUICK REVIEW

[论文解读] RealFusion: 360° Reconstruction of Any Object from a Single Image

Luke Melas-Kyriazi, Christian Rupprecht|arXiv (Cornell University)|Feb 21, 2023

Advanced Vision and Imaging被引用 19

一句话总结

RealFusion 通过在扩散先验引导的梦境式新视图下拟合神经辐射场，从单张图像重建任意对象的完整360° 3D 模型，采用 InstantNGP 实现高效。

ABSTRACT

We consider the problem of reconstructing a full 360° photographic model of an object from a single image of it. We do so by fitting a neural radiance field to the image, but find this problem to be severely ill-posed. We thus take an off-the-self conditional image generator based on diffusion and engineer a prompt that encourages it to "dream up" novel views of the object. Using an approach inspired by DreamFields and DreamFusion, we fuse the given input view, the conditional prior, and other regularizers in a final, consistent reconstruction. We demonstrate state-of-the-art reconstruction results on benchmark images when compared to prior methods for monocular 3D reconstruction of objects. Qualitatively, our reconstructions provide a faithful match of the input view and a plausible extrapolation of its appearance and 3D shape, including to the side of the object not visible in the image.

研究动机与目标

激励从单视角恢复完整的360°摄影对象的问题，突出单幅图像三维重建的病态性。
提出一种方法，利用预训练的二维扩散图像生成器作为先验来幻化出可信的新视图。
开发一个高效的多尺度辐射场表示，配合正则化项以呈现真实外观和合理几何。
引入单图像文本反演，将扩散先验条件化为特定输入对象。
在野外图像和基准数据集上展示最先进的重建质量，无需类别特定的监督。

提出的方法

将外观和几何表示为神经辐射场（RF），通过重建损失优化以匹配输入视图。
将预训练扩散模型条件化在从单图像文本反演学习的提示嵌入上，以合成对象的可信新视图。
应用分数蒸馏采样（SDS）让RF与扩散模型先验在随机采样的新视点上对齐。
采用粗到细的训练计划，使用 InstantNGP 基于网格的RF以提高效率。
包含正则化项：2D 法线平滑、纹理 dropout、以及基于掩模的 L2 项以提升表面质量；执行图像对齐的掩模损失和法线正则化项。
保持固定的重建相机，并在每次迭代采样新视点以在保持对输入视图保真度的同时强化先验一致性。

Figure 2 : Method diagram. Our method optimizes a neural radiance field using two objectives simultaneously: a reconstruction objective and a prior objective. The reconstruction objective ensures that the radiance field resembles the input image from a specific, fixed view. The prior objective uses

实验结果

研究问题

RQ1是否可以让条件化于输入图像的扩散模型先验实现从单视图到360°对象重建的真实可信？
RQ2单图像文本反演如何影响重建视图的质量与多样性？
RQ3哪些正则化项和训练策略能在仅用一张图像重建任意对象时得到合理的几何和外观？
RQ4RealFusion 相较于类别特定或多视图重建方法在标准基准上的表现如何？

主要发现

与先前的单目3D方法相比，RealFusion 在基准重建的定量结果上达到最先进水平。
定量评估（F-score）和外观相似度（CLIP）在七个对象类别上优于 Shelf-Supervised Mesh Prediction，取平均增益。
单图像文本反演对高质量重建至关重要；若没有它，背面往往更像通用类别范例而非真实对象。
粗到细的训练和法线平滑正则化提升表面质量并减少伪影。
以 Stable Diffusion 作为扩散先验在生成更高质量重建方面优于 CLIP 等替代方案。
RealFusion 能从同一输入视图生成多种可信的360°重建，变异主要在被遮挡的背面。

Figure 3 : Examples demonstrating the level of detail of information captured by the optimized embedding $\langle\textbf{e}\rangle$ . Rows 1-2 show input images and masks. The images are used to optimize $\langle\textbf{e}\rangle$ via our single-image textual inversion process. Rows 3-5 show example

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。