QUICK REVIEW

[论文解读] SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections

Mark Boss, Andreas Engelhardt|arXiv (Cornell University)|May 31, 2022

Advanced Vision and Imaging被引用 38

一句话总结

SAMURAI 共同优化 3D 形状、BRDF、每幅图像的相机位姿以及来自无约束野外图像集合的光照，以生成可重照明的 3D 资产和网格，而无需完美的位姿或掩模。

ABSTRACT

Inverse rendering of an object under entirely unknown capture conditions is a fundamental challenge in computer vision and graphics. Neural approaches such as NeRF have achieved photorealistic results on novel view synthesis, but they require known camera poses. Solving this problem with unknown camera poses is highly challenging as it requires joint optimization over shape, radiance, and pose. This problem is exacerbated when the input images are captured in the wild with varying backgrounds and illuminations. Standard pose estimation techniques fail in such image collections in the wild due to very few estimated correspondences across images. Furthermore, NeRF cannot relight a scene under any illumination, as it operates on radiance (the product of reflectance and illumination). We propose a joint optimization framework to estimate the shape, BRDF, and per-image camera pose and illumination. Our method works on in-the-wild online image collections of an object and produces relightable 3D assets for several use-cases such as AR/VR. To our knowledge, our method is the first to tackle this severely unconstrained task with minimal user interaction. Project page: https://markboss.me/publication/2022-samurai/ Video: https://youtu.be/LlYuGDjXp-8

研究动机与目标

在缺乏固定相机内参和干净分割的无约束真实世界图像集合中，推动 3D 形状和材料重建。
开发一个联合优化框架，估计形状、BRDF、每幅图像的光照以及每幅图像的相机位姿/内参。
通过引入鲁棒初始化、相机多路复用以及图像后验缩放，放宽对完美位姿/掩模输入的依赖。
实现带 BRDF 纹理的显式网格提取，用于 AR/VR 和材料编辑应用。

提出的方法

基于 Neural-PIL/NeRF 风格的神经体积，在每个 3D 位置表示 3D 形状和 BRDF，并具有每幅图像的光照嵌入。
联合优化一个灵活的、面向对象的每图像相机参数化，采用 look-at 方案和每图像焦距以处理不同距离。
引入相机多路复用：对每张图像优化多个位姿，并采用动态损失重新加权以避免局部极小值。
在优化过程中对输入图像进行后验缩放，以降低噪声掩模或图像的权重。
应用从粗到细的损失调度、傅里叶频率退火和正则化，以稳定 BRDF 和光照估计。
从学习到的神经体积中提取带 BRDF 纹理的显式网格，用于下游图形处理。

实验结果

研究问题

RQ1是否可以从无约束的真实世界图像集合中联合估计 3D 形状、BRDF、每图像光照和相机参数？
RQ2在位姿粗略/未知且掩模嘈杂时，联合优化的表现如何？
RQ3在基于神经体积的重建中，相机多路复用策略是否比单相机优化在收敛性和准确性方面有所提升？
RQ4所得到的模型能否支持可用于 AR/VR 应用的再照明和网格提取？

主要发现

与 BARF-A 和基线方法相比，SAMURAI 在野外数据集中的新视图合成和再照明方面有显著改进，即使没有准确的姿态初始值。
它联合估计每图像光照、BRDF 参数和相机位姿，使可重光的 3D 资产在没有完美掩模或位姿的情况下成为可能。
带有动态损失重新加权的相机多路复用有助于摆脱局部极小值，并在具有挑战性的数据集上稳定优化。
后验图像缩放和鲁棒优化调度提高了重建质量，并增强对嘈杂掩模与图像的鲁棒性。
从学习到的神经体积提取带 BRDF 的显式网格，为 AR/VR 与材料编辑提供可用资产。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。