QUICK REVIEW

[论文解读] Learning to Reconstruct Shapes from Unseen Classes

Xiuming Zhang, Zhoutong Zhang|arXiv (Cornell University)|Dec 28, 2018

3D Shape Modeling and Analysis参考文献 3被引用 80

一句话总结

GenRe 引入一个模块化、几何感知的管线，将 2.5D 深度、球面映射完成和体素细化解耦，以从单图像重建 3D 形状，并对未知对象类别具有泛化能力。

ABSTRACT

From a single image, humans are able to perceive the full 3D shape of an object by exploiting learned shape priors from everyday life. Contemporary single-image 3D reconstruction algorithms aim to solve this task in a similar fashion, but often end up with priors that are highly biased by training classes. Here we present an algorithm, Generalizable Reconstruction (GenRe), designed to capture more generic, class-agnostic shape priors. We achieve this with an inference network and training procedure that combine 2.5D representations of visible surfaces (depth and silhouette), spherical shape representations of both visible and non-visible surfaces, and 3D voxel-based representations, in a principled manner that exploits the causal structure of how 3D shapes give rise to 2D images. Experiments demonstrate that GenRe performs well on single-view shape reconstruction, and generalizes to diverse novel objects from categories not seen during training.

研究动机与目标

推动超越训练类别的可泛化单图像3D重建研究。
将几何投影从形状重建中解耦以提升泛化性。
利用 2.5D 表示、球面映射和体素空间实现高精度重建。
在已见与未见类别上展示-state-of-the-art 性能，并分析各组件贡献。

提出的方法

通过三个级联模块实现固定几何投影连接：一个深度估计器（2D->2.5D），一个球面映射投影（2.5D->S），一个球面映射在填充网络（S->S），以及一个体素投影（S->3D）后接一个体素细化网络。
从单张 RGB 图像预测深度以提供以视图为中心的 2.5D 草图，然后投影到部分球面映射。
一个在填充网络完成部分球面映射，使其能够投影到完整的 3D 体素表示。
一个体素细化网络将深度投影和球面映射投影得到的体素估计融合，生成最终的 3D 形状。
所有投影都是固定的几何操作；可学习的组件仅建模表面几何，从而提升泛化性。
训练是以视图为中心的，3D 监督与输入图像姿态对齐，以便更好地泛化到未见类别。

实验结果

研究问题

RQ1将几何投影与学习解耦是否可提升单图像3D重建对未见对象类别的泛化性？
RQ22.5D 草图和球面映射表示是否比在体素空间直接完成 3D 更有利于泛化？
RQ3每个模块在已见与未见类别的重建精度上各有何贡献？
RQ4将合成 ShapeNet 数据转移到真实图像（Pix3D 数据集）时，该方法的鲁棒性如何？

主要发现

模型	已见	未见	Bch	Vsl	Rfl	Sfa	Tbl	Phn	Cbn	Spk	Lmp	Dsp
DRC (Tulsiani2017)	.072	.112	.100	.104	.108	.133	.199	.168	.164	.145	.188	.142
AtlasNet (Groueix2018)	.059	.102	.100	.104	.098	.130	.146	.149	.158	.131	.173	.127
DRC (Tulsiani2017) - Object-Centered	.092	.120	.109	.121	.107	.129	.132	.142	.141	.131	.156	.129
MarrNet (Wu2017)	.070	.107	.094	.125	.090	.122	.117	.125	.123	.144	.149	.120
Multi-View (Shin2018)	.065	.092	.092	.102	.085	.105	.110	.119	.117	.142	.142	.111
3D Completion	.076	.102	.099	.121	.095	.109	.122	.131	.126	.138	.141	.118
GenRe-1step	.063	.104	.093	.114	.084	.108	.121	.128	.124	.126	.151	.115
GenRe-2step	.061	.098	.094	.117	.084	.102	.115	.125	.125	.118	.118	.110
GenRe (Ours)	.064	.089	.092	.112	.082	.096	.107	.116	.115	.124	.130	.106
GenRe-Oracle	.045	.050	.048	.031	.059	.057	.054	.076	.077	.060	.060	.057
GenRe-SphOracle	.034	.032	.030	.021	.044	.038	.037	.044	.045	.031	.040	.036

GenRe 在 ShapeNet 基于的实验中对已见和未见类别都达到了最先进的重建性能。
两步、因子化的方法（深度->球面映射在填充->体素投影）优于单步球面映射基线。
在真实图像（Pix3D）上，GenRe 在未见类别上通常优于基线，个别情况（床类）例外。
从三个训练类别学到的深度估计可泛化到新类别，且未显著降级。
球面映射在填充方面能有效完成不可见表面，并对新形状具有良好泛化性。
以视图为中心的监督在许多情况下比以对象为中心的监督更有助于未见类别的泛化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。