QUICK REVIEW

[论文解读] Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction

Tong He, John Collomosse|arXiv (Cornell University)|Jun 15, 2020

3D Shape Modeling and Analysis参考文献 36被引用 100

一句话总结

Geo-PIFu 学习几何对齐的潜在体素特征以及像素对齐的图像特征，以单幅图像重建穿衣人类网格；通过潜在3D体素代理进行正则化，达到全球拓扑与局部表面细节的最新水平。

ABSTRACT

We propose Geo-PIFu, a method to recover a 3D mesh from a monocular color image of a clothed person. Our method is based on a deep implicit function-based representation to learn latent voxel features using a structure-aware 3D U-Net, to constrain the model in two ways: first, to resolve feature ambiguities in query point encoding, second, to serve as a coarse human shape proxy to regularize the high-resolution mesh and encourage global shape regularity. We show that, by both encoding query points and constraining global shape using latent voxel features, the reconstruction we obtain for clothed human meshes exhibits less shape distortion and improved surface details compared to competing methods. We evaluate Geo-PIFu on a recent human mesh public dataset that is $10 imes$ larger than the private commercial dataset used in PIFu and previous derivative work. On average, we exceed the state of the art by $42.7\%$ reduction in Chamfer and Point-to-Surface Distances, and $19.4\%$ reduction in normal estimation errors.

研究动机与目标

Motivate improved single-view clothed human reconstruction by addressing feature ambiguity and global shape regularity.
Introduce geometry-aligned latent voxel features trained with a coarse occupancy proxy to regularize high-resolution mesh output.
Fuse 3D geometry features and 2D pixel features to improve both global topology and local surface details.

提出的方法

Encode query points with fused geometry-aligned 3D features and pixel-aligned 2D features.
Lift input image to a low-resolution latent 3D voxel grid using 3D U-Nets to produce geometry-aligned features.
Extract pixel-aligned features from a 2D U-Net and interpolate them at the projected pixel coordinates of query points.
Use trilinear interpolation over latent voxel features to obtain geometry-aligned encodings for each query point.
Train with a coarse occupancy volume loss to supervise latent voxel features and a high-frequency query-point loss for occupancy values.
Optimize an implicit surface function f(I, P) that maps image I and 3D point P to occupancy σ, using L_geo and L_query losses with a staged training schedule.

实验结果

研究问题

RQ1Can geometry-aligned latent voxel features coupled with pixel-aligned features resolve local feature ambiguity in single-view clothed human reconstruction?
RQ2Does a global shape proxy via latent voxel features improve the plausibility and topology of the reconstructed mesh?
RQ3How do different feature fusion strategies affect global topology versus local surface details in Geo-PIFu?

主要发现

网格	法线	CD	PSD	Cosine	L2距离
DeepHuman	0.2088	11.928	11.246	0.2088	0.4647
PIFu	0.0914	2.604	4.026	0.0914	0.3009
Geo-PIFu (ours)	0.0682	1.742	1.922	0.0682	0.2603

Geo-PIFu achieves a 42.7% reduction in Chamfer Distance and Point-to-Surface Distance over the prior state-of-the-art on the DeepHuman dataset.
Geo-PIFu reduces normal estimation errors by 19.4% on average compared with PIFu.
Using both geometry-aligned 3D features and pixel-aligned 2D features yields better global topology and local surface details than using either alone.
A coarse latent voxel proxy regularizes global shape, reducing artifacts such as distorted hands and feet.
Ablation studies show that early fusion of 3D and 2D features provides competitive performance with simple implementation.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。