QUICK REVIEW

[论文解读] A Point Set Generation Network for 3D Object Reconstruction from a Single Image

Haoqiang Fan, Hao Su|arXiv (Cornell University)|Dec 2, 2016

3D Shape Modeling and Analysis参考文献 19被引用 55

一句话总结

该论文提出了一种新颖的点集生成网络（PSGN），用于从单张图像进行3D物体重建，采用条件生成模型预测多个合理的3D点云。通过使用地球移动距离（EMD）作为可微分损失函数来处理真实标签的模糊性以及不规则点云输出，PSGN在单图像3D重建基准测试中达到最先进性能，无论在重建质量还是预测多样性方面均优于先前方法。

ABSTRACT

Generation of 3D data by deep neural network has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images; however, these representations obscure the natural invariance of 3D shapes under geometric transformations and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output -- point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthodox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3d reconstruction benchmarks; but it also shows a strong performance for 3d shape completion and promising ability in making multiple plausible predictions.

研究动机与目标

为解决从单张图像进行3D重建的病态问题，即一张2D图像可能对应多个合理的3D形状。
克服传统3D表示方法（如体素网格或网格）所导致的几何不变性模糊化及量化伪影问题。
设计一种深度生成模型，直接输出无序点云，这是一种更自然且灵活的3D表示方式。
通过将输出建模为合理3D形状的条件分布，处理真实标签中固有的模糊性。

提出的方法

提出一种条件形状采样器，从单张图像输入生成多个合理的3D点云。
采用深度编码器-解码器架构，包含一个用于768个点的反卷积分支和一个用于256个点的全连接分支。
使用地球移动距离（EMD）近似作为可微分损失函数，以衡量预测点集与真实点集之间的距离。
引入一个后处理3D卷积神经网络，将点云转换为体素占据网格，从而提升重建质量。
使用Adam优化器在192×256图像分辨率下进行端到端训练，训练计划包含300,000个训练步。
引入一个独立的体积累计网络，以提升在不同尺寸物体间的泛化能力。

实验结果

研究问题

RQ1深度学习模型能否在存在多个有效3D形状的情况下，有效生成多样且高质量的3D点云？
RQ2如何设计一种可微分损失函数，以公平评估点云生成结果，考虑到点集的排列不变性与不规则结构？
RQ3与标准的L2或Chamfer距离损失相比，基于EMD的训练在泛化能力和多样性方面能提升多少？
RQ4单个网络能否学习生成同一张图像输入下多个合理的不完整或模糊3D形状补全？
RQ5该模型在存在遮挡或缺失几何线索情况下的形状补全与重建任务中表现如何？

主要发现

采用EMD损失训练的模型在EMD和Chamfer距离（CD）指标上均表现更优，在基准数据集上超越了最先进方法。
即使CD被用作训练目标，EMD训练的网络仍能生成分布更均匀的点云，并取得更低的EMD值，优于CD训练的模型。
在模糊输入（如部分遮挡的椅子或非多边形物体）下，EMD训练的模型泛化能力优于人类受试者，后者在面对缺失或模糊线索时表现困难。
该模型能够为每张输入图像成功生成多个合理的3D重建结果，充分展示了其作为条件形状采样器的能力。
通过端到端训练的3D CNN后处理与体积累计网络可显著提升重建质量，完整流水线优于3D-R2N2。
失败案例揭示了在未见物体类别或多物体场景下泛化能力的局限性，尤其当未使用注意力或检测机制时更为明显。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。