QUICK REVIEW

[论文解读] MarrNet: 3D Shape Reconstruction via 2.5D Sketches

Jiajun Wu, Yifan Wang|arXiv (Cornell University)|Nov 8, 2017

3D Shape Modeling and Analysis参考文献 3被引用 234

一句话总结

MarrNet 从单张图像重建三维物体形状，先估计 2.5D 草图（深度、法线、轮廓），再从这些草图恢复一个可微分重投影一致性损失的 3D体素形状。

ABSTRACT

3D object reconstruction from a single image is a highly under-determined problem, requiring strong prior knowledge of plausible 3D shapes. This introduces challenges for learning-based approaches, as 3D object annotations are scarce in real images. Previous work chose to train on synthetic data with ground truth 3D information, but suffered from domain adaptation when tested on real data. In this work, we propose MarrNet, an end-to-end trainable model that sequentially estimates 2.5D sketches and 3D object shape. Our disentangled, two-step formulation has three advantages. First, compared to full 3D shape, 2.5D sketches are much easier to be recovered from a 2D image; models that recover 2.5D sketches are also more likely to transfer from synthetic to real data. Second, for 3D reconstruction from 2.5D sketches, systems can learn purely from synthetic data. This is because we can easily render realistic 2.5D sketches without modeling object appearance variations in real images, including lighting, texture, etc. This further relieves the domain adaptation problem. Third, we derive differentiable projective functions from 3D shape to 2.5D sketches; the framework is therefore end-to-end trainable on real images, requiring no human annotations. Our model achieves state-of-the-art performance on 3D shape reconstruction.

研究动机与目标

在强域迁移挑战下，推动从单张图像进行三维重建。
提出一个两步、端到端可训练的流程，将 2.5D 草图估计与完整的 3D 形状重建分离。
利用可微分的重投影约束将 3D 形状与 2.5D 草图对齐，并在真实图像上实现自监督微调。
展示在合成数据 ShapeNet 与真实数据集 PASCAL 3D+ 以及 IKEA 上的改进的 3D 重建性能。
显示 2.5D 草图在微调过程中提高迁移性和形状先验的保持。

提出的方法

提出包含三个组成部分的 MarrNet：2.5D 草图估计器（深度、法线、轮廓）、3D 形状估计器（基于体素）以及一个重投影一致性损失。
采用编码器-解码器结构进行 2.5D 草图估计；编码器为 ResNet-18；输出深度、法线、轮廓，分辨率为 256x256。
将 3D 形状估计器设计为一个编码器-解码器，将 2.5D 草图映射到一个 128x128x128 的体素网格，沿用 TL 网络和 3D-VAE-GAN 的设计线索。
引入一个可微分的重投影损失，在正交投影下强制体素化的 3D 形状与估计的深度和法线图之间的一致性。
训练遵循两步范式：在合成数据 ShapeNet 上对 2.5D 草图（L2 损失）和 3D 体素（交叉熵）进行预训练；然后在真实图像上使用重投影一致性损失进行微调，同时固定 3D 解码器以保留形状先验。
可选地，在测试阶段，对单张图像进行自监督微调（最多 40 次迭代，约 10 秒）。

实验结果

研究问题

RQ1一个使用 2.5D 草图的两步方案是否能相对于直接的 RGB-to-voxel 方法提升单图像的三维重建？
RQ2学习 2.5D 草图在从合成数据到真实数据的迁移中是否比全面的 3D 监督更容易？
RQ3可微分的 2D-3D 重投影约束是否能够在没有标注的真实图像上实现端到端微调？
RQ4在微调过程中固定 3D 解码器在多大程度上能保持学习到的形状先验并提升真实感？
RQ5MarrNet 在合成数据 ShapeNet 以及真实数据集如 PASCAL 3D+ 与 IKEA 上在定性和定量方面的表现如何？

主要发现

MarrNet 在 ShapeNet 椅子上的 IoU 高于直接 RGB-to-3D 的基线（IoU 0.57 对 0.52）。
在 Pascal 3D+ 椅子上，MarrNet 在用户研究中优于最先进的 DRC（74% 的用户偏好 MarrNet 而非 DRC；42% 偏好真实 ground truth）。
在真实数据适配时对解码器进行固定的微调保持了形状先验，并产生比无约束微调更详细的 3D 重建。
MarrNet 在真实图像（PASCAL 3D+、IKEA）上的 3D 形状重建能力更强，并在多种对象类别上在定性结果上呈现一致改进。
人工评价显示 MarrNet 相较于 DRC 在各数据集中的比较中有 74% 的偏好，以及相对于某些基线配置的偏好。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。