[论文解读] Deferred Neural Rendering: Image Synthesis using Neural Textures
本文提出 Neural Textures 和 Deferred Neural Renderer,能够基于不完美的 3D 重建合成照片级真实感图像,实现新视角合成和在一个嵌入 3D 的端到端可训练管线中的编辑。
The modern computer graphics pipeline can synthesize images at remarkable visual quality; however, it requires well-defined, high-quality 3D content as input. In this work, we explore the use of imperfect 3D content, for instance, obtained from photo-metric reconstructions with noisy and incomplete surface geometry, while still aiming to produce photo-realistic (re-)renderings. To address this challenging problem, we introduce Deferred Neural Rendering, a new paradigm for image synthesis that combines the traditional graphics pipeline with learnable components. Specifically, we propose Neural Textures, which are learned feature maps that are trained as part of the scene capture process. Similar to traditional textures, neural textures are stored as maps on top of 3D mesh proxies; however, the high-dimensional feature maps contain significantly more information, which can be interpreted by our new deferred neural rendering pipeline. Both neural textures and deferred neural renderer are trained end-to-end, enabling us to synthesize photo-realistic images even when the original 3D content was imperfect. In contrast to traditional, black-box 2D generative neural networks, our 3D representation gives us explicit control over the generated output, and allows for a wide range of application domains. For instance, we can synthesize temporally-consistent video re-renderings of recorded 3D scenes as our representation is inherently embedded in 3D space. This way, neural textures can be utilized to coherently re-render or manipulate existing video content in both static and dynamic environments at real-time rates. We show the effectiveness of our approach in several experiments on novel view synthesis, scene editing, and facial reenactment, and compare to state-of-the-art approaches that leverage the standard graphics pipeline as well as conventional generative neural networks.
研究动机与目标
- 激发从不完美的 3D 重建(带噪声、缺坑、过平滑)中实现照片级真实渲染。
- 提出将神经纹理作为附着在 3D 代理上的学习型 2D 映射,用以存储丰富的外观信息。
- 引入可微分、端到端可训练的渲染器,该渲染器解析神经纹理以生成最终图像。
- 实现新视图合成、静态场景编辑和动态场景重演等应用。
- 展示与纯二维生成方法相比,在时间一致性和输出的 3D 空间控制方面的提升。
提出的方法
- 将学习到的神经纹理以高维特征图的形式存储到 3D 网格代理上,从而实现更丰富的外观编码。
- 构建 Neural Texture Hierarchies(多层纹理),在采样过程中平衡缩小与放大。
- 使用可微分的双线性采样对神经纹理进行采样,以创建屏幕空间特征图。
- 应用一个 Deferred Neural Renderer(U-Net 风格的编码器-解码器),对特征图(及可选的视图输入)进行解释以合成最终图像。
- 通过为前 9 个特征通道增加球谐函数来实现视角相关效应,使特征随视线方向旋转,从而增强渲染。
- 用来自真实图像的裁剪区域的 L1 光度损失对神经纹理和渲染器进行端到端训练。
- 为训练预先计算 uv 映射;对带有神经纹理的代理几何进行栅格化,以生成渲染器的输入。
实验结果
研究问题
- RQ1从真实数据学习的神经纹理是否能够实现基于不完美 3D 重建的照片级再渲染?
- RQ2将神经纹理与 延迟神经渲染器进行端到端训练,是否能实现时间上连贯的新视图并支持场景编辑?
- RQ3在质量和效率方面,神经纹理与传统的 IBR(图像基渲染)与基于图像的渲染方法相比如何?
- RQ4纹理分辨率、层级结构和代理几何质量对渲染精度有何影响?
- RQ5该方法是否能够同时处理静态新视图合成以及动态/面部重现场景?
主要发现
- 神经纹理和 Deferred Neural Renderer 能以接近实时的速度从不完美几何体实现照片级再渲染。
- 分层神经纹理提升质量,在更高分辨率下比单纹理达到更低的 MSE(例如在 2048×2048、使用层级时为 0.38 MSE)。
- 单一神经纹理在约 256×256 分辨率时达到最佳点;层级结构允许更高分辨率进一步提升效果。
- 与基于 Pix2Pix 的翻译相比,该方法在新视图渲染更清晰、时间一致性更好。
- 与经典的基于图像的渲染基线相比,所提出的方法在测试阶段不需要存储数百张高分辨率图像,并使用一个紧凑的神经纹理(512×512×16)以及一个 16M 参数的渲染器。
- 该方法对几何代理分辨率的降低仍具鲁棒性,仍可生成照片级真实输出。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。