QUICK REVIEW

[论文解读] Pix2Vex: Image-to-Geometry Reconstruction using a Smooth Differentiable Renderer

Felix Petersen, Amit H. Bermano|arXiv (Cornell University)|Mar 26, 2019

Advanced Vision and Imaging参考文献 39被引用 41

一句话总结

Pix2Vex 学会通过将新颖的平滑可微渲染器与图像到图像转换器配对，在重建对抗网络中重建3D几何，从而实现以最小监督和无真实3D模型的3D预测。

ABSTRACT

The long-coveted task of reconstructing 3D geometry from images is still a standing problem. In this paper, we build on the power of neural networks and introduce Pix2Vex, a network trained to convert camera-captured images into 3D geometry. We present a novel differentiable renderer ($DR$) as a forward validation means during training. Our key insight is that $DR$s produce images of a particular appearance, different from typical input images. Hence, we propose adding an image-to-image translation component, converting between these rendering styles. This translation closes the training loop, while allowing to use minimal supervision only, without needing any 3D model as ground truth. Unlike state-of-the-art methods, our $DR$ is $C^\infty$ smooth and thus does not display any discontinuities at occlusions or dis-occlusions. Through our novel training scheme, our network can train on different types of images, where previous work can typically only train on images of a similar appearance to those rendered by a $DR$.

研究动机与目标

在没有地面实测3D模型或光照/纹理监督的情况下，推动从2D图像重建3D几何。
引入一个 C∞ 光滑可微渲染器，以在遮挡处提供梯度。
通过图像到图像转换器来闭合训练循环，桥接渲染域。
开发一个 Reconstructive Adversarial Network (RAN) 来在没有3D监督的情况下训练重建器。
在合成数据和相机捕获数据上展示单视图与多视图重建。

提出的方法

提出一个基于对邻近三角形软混合的 C∞ 光滑可微渲染器（SR），以确保在遮挡处的可微分性。
训练一个 pix2vex 重建器，从输入图像预测3D顶点偏移，使用基准网格和逐顶点更新。
使用图像到图像转换器链（a2b 和 b2a）在 SR 输出域和输入图像之间建立桥接，使在渲染风格不匹配时也能训练。
使用具有多个子 RAN 路径的 Reconstructive Adversarial Network (RAN)，对翻译组件和重建器进行自监督。
跨域利用循环一致性和 L1 损失来稳定训练，避免模式崩溃。
在可用时使用多视图输入进行训练，并提出单视图重建的策略。

实验结果

研究问题

RQ1一个平滑可微渲染器是否能在遮挡处提供可用的3D重建梯度？
RQ2是否可以通过使用 RAN 框架，在没有地面实测3D监督的情况下从图像预测3D网格几何？
RQ3在渲染器输出与输入图像之间进行域转换是否能实现最小监督的训练？
RQ4该方法在合成数据（ShapeNet）和相机捕获图像上的表现如何？
RQ5单视图与多视图输入对重建质量的影响是什么？

主要发现

所提出的 SR 渲染器是 C∞ 光滑的，能够在遮挡处实现可微分梯度。
Pix2vex 可以从图像出发，以基准网格为起点预测3D顶点偏移，产生无需显式3D监督的重建。
一个两图像翻译链（a2b 和 b2a）将 SR 输出域桥接到输入域，闭合训练循环。
RAN 框架通过对抗目标和循环一致性损失，实现 pix2vex 与转换器的无监督训练。
实验表明在 ShapeNet 上的多视图训练可产生可信的重建；展示了从相机捕获的鞋子进行单视图重建，突出了鲁棒性和局限性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。