QUICK REVIEW

[论文解读] IGNOR: Image-guided Neural Object Rendering

Justus Thies, Michael Zollhöfer|arXiv (Cornell University)|Nov 26, 2018

Advanced Vision and Imaging参考文献 54被引用 23

一句话总结

该论文提出IGNOR，一种自监督神经渲染方法，结合基于图像的渲染与深度学习，生成具有精确视图依赖性效果的3D物体照片级真实感重渲染。通过使用孪生网络（EffectsNet）估计并去除输入图像中的高光，实现漫反射图像的视图重映射，再通过组合网络（CompositionNet）将重映射的图像与重新插入的视图依赖性效果融合，该方法在处理复杂外观（尤其是高光）方面达到最先进水平，在定量和定性基准测试中均优于传统IBR方法和学习型方法。

ABSTRACT

We propose a learned image-guided rendering technique that combines the benefits of image-based rendering and GAN-based image synthesis. The goal of our method is to generate photo-realistic re-renderings of reconstructed objects for virtual and augmented reality applications (e.g., virtual showrooms, virtual tours \& sightseeing, the digital inspection of historical artifacts). A core component of our work is the handling of view-dependent effects. Specifically, we directly train an object-specific deep neural network to synthesize the view-dependent appearance of an object. As input data we are using an RGB video of the object. This video is used to reconstruct a proxy geometry of the object via multi-view stereo. Based on this 3D proxy, the appearance of a captured view can be warped into a new target view as in classical image-based rendering. This warping assumes diffuse surfaces, in case of view-dependent effects, such as specular highlights, it leads to artifacts. To this end, we propose EffectsNet, a deep neural network that predicts view-dependent effects. Based on these estimations, we are able to convert observed images to diffuse images. These diffuse images can be projected into other views. In the target view, our pipeline reinserts the new view-dependent effects. To composite multiple reprojected images to a final output, we learn a composition network that outputs photo-realistic results. Using this image-guided approach, the network does not have to allocate capacity on ``remembering'' object appearance, instead it learns how to combine the appearance of captured images. We demonstrate the effectiveness of our approach both qualitatively and quantitatively on synthetic as well as on real data.

研究动机与目标

解决在准确呈现镜面高光等视图依赖性效果的前提下，渲染3D物体照片级真实感新视图的挑战。
克服传统基于图像的渲染因在几何不准确或遮挡边界处进行视图混合而产生的伪影。
通过学习可微分的组合网络来替代手工设计的混合方案，以处理重投影图像。
仅使用RGB视频和多视角立体重建实现自监督训练，避免昂贵的监督信号。
实现高保真重渲染，同时保持视图依赖性效果的时间一致性和空间准确性。

提出的方法

该方法使用多视角立体重建从物体的RGB视频生成粗略的3D代理几何结构。
EffectsNet（一种孪生卷积神经网络）被训练以从输入图像中预测并去除视图依赖性效果（如镜面高光），生成适合重映射的漫反射图像。
利用3D代理几何结构和相机参数，将漫反射图像重投影到目标视图，保持几何一致性。
在目标视图中，EffectsNet基于目标视角预测新的视图依赖性效果，并将其重新添加到重映射的漫反射图像中。
CompositionNet（一种编码器-解码器网络）将K个最近邻的重映射图像进行融合，生成最终的照片级真实感输出图像。
整个流程通过最小化最终输出与真实目标图像之间的L1损失，以自监督方式端到端训练。

实验结果

研究问题

RQ1自监督深度神经网络能否有效从输入图像中解耦视图依赖性效果，以实现漫反射外观的精确重映射？
RQ2学习得到的组合网络能否在组合多张重映射图像以生成新视图方面优于传统混合方案？
RQ3显式建模并重新插入视图依赖性效果是否能带来比端到端学习或传统IBR方法更高的视觉保真度？
RQ4该方法在与最先进学习型和传统基于图像的渲染技术对比时表现如何，特别是在具有挑战性的视图依赖条件下？
RQ5该方法能否在无显式监督的情况下泛化到具有复杂材质（如高度镜面表面）的真实世界物体？

主要发现

在真实数据上，该方法的均方误差（MSE）为25.24，优于最先进IBR方法DeepBlending（MSE: 45.07）和InsideOut（MSE: 51.17）。
EffectsNet成功地去除并重新插入了镜面高光，使视图依赖性效果的动画具有时间一致性且视觉上合理。
CompositionNet有效解决了重投影误差并填补了遮挡区域，生成了无伪影、无鬼影现象的高保真输出。
与纯学习型方法在数据稀缺时表现出强烈伪影不同，该方法在训练数据量较小时表现出平滑退化。
该方法运行速度达到交互水平：在NVIDIA 1080Ti上，EffectsNet为50 Hz，CompositionNet为10 Hz，适用于实时VR/AR应用。
视觉对比显示，与Pix2Pix和Hedman等人方法相比，该方法在特写区域生成了更清晰、更准确的镜面高光和更优的颜色一致性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。