[论文解读] Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration
本文提出局部感知分段变换场(PTF),一种基于学习的方法,通过利用局部特征对齐的变换场来估计精确的姿态初始化,从而提升穿衣人体点云的3D人体网格配准性能。通过从局部特征预测点对应关系,并使用最小二乘拟合恢复关节旋转,PTF在参数效率和重建精度方面均优于先前方法,实现了更准确的穿衣人体点云重建与配准。
Registering point clouds of dressed humans to parametric human models is a challenging task in computer vision. Traditional approaches often rely on heavily engineered pipelines that require accurate manual initialization of human poses and tedious post-processing. More recently, learning-based methods are proposed in hope to automate this process. We observe that pose initialization is key to accurate registration but existing methods often fail to provide accurate pose initialization. One major obstacle is that, regressing joint rotations from point clouds or images of humans is still very challenging. To this end, we propose novel piecewise transformation fields (PTF), a set of functions that learn 3D translation vectors to map any query point in posed space to its correspond position in rest-pose space. We combine PTF with multi-class occupancy networks, obtaining a novel learning-based framework that learns to simultaneously predict shape and per-point correspondences between the posed space and the canonical space for clothed human. Our key insight is that the translation vector for each query point can be effectively estimated using the point-aligned local features; consequently, rigid per bone transformations and joint rotations can be obtained efficiently via a least-square fitting given the estimated point correspondences, circumventing the challenging task of directly regressing joint rotations from neural networks. Furthermore, the proposed PTF facilitate canonicalized occupancy estimation, which greatly improves generalization capability and results in more accurate surface reconstruction with only half of the parameters compared with the state-of-the-art. Both qualitative and quantitative studies show that fitting parametric models with poses initialized by our network results in much better registration quality, especially for extreme poses.
研究动机与目标
- 解决基于深度学习的穿衣人体3D人体网格配准中姿态初始化不准确的挑战。
- 克服直接通过神经网络从点云回归关节旋转的困难。
- 提升参数化人体模型隐式表面学习中的泛化能力与重建质量。
- 在保持或提升配准精度的前提下,减少模型参数量,相较当前最优方法更具优势。
- 通过规范化占据估计,实现对极端姿态的鲁棒配准。
提出的方法
- 提出分段变换场(PTF),一组将姿态空间中的查询点映射到其在静止姿态空间中对应位置的函数,利用局部点云特征实现。
- 使用多类别占据网络联合预测:(1) 双层占据(身体内部、衣物之间、外部),(2) 身体部位标签,以及 (3) 每个点在静止姿态下的对应关系。
- 在预测的点对应关系上应用最小二乘拟合,高效恢复刚性骨骼变换与关节旋转,避免直接回归旋转参数。
- 通过在占据推理前将查询点转换至静止姿态空间,引入规范化步骤,简化学习任务并提升泛化能力。
- 采用三平面卷积特征编码器(ConvONet)实现高效且精确的特征提取,替代先前工作中内存密集型的体素化IFNet。
- 在训练过程中应用随机旋转数据增强,以提升对多样化输入姿态的鲁棒性与泛化能力。
实验结果
研究问题
- RQ1能否有效利用局部点云特征,实现姿态空间与静止姿态空间之间点对应关系的精确估计,以支持人体网格配准?
- RQ2与端到端回归基线相比,避免直接回归6D旋转矩阵是否能提升姿态估计的准确性?
- RQ3分段变换场是否能在保持或提升重建与配准质量的同时减少模型参数量?
- RQ4通过PTF实现的规范化占据估计,如何影响模型的泛化能力与表面重建保真度?
- RQ5所提出方法是否能泛化至原始、未经处理的扫描数据,以及在基线方法失效的极端姿态下仍保持有效?
主要发现
- 本方法将姿态估计的顶点误差从基线的74.4 mm降低至34.1 mm,显著提升了姿态精度。
- PTF-Piecewise实现89.4%的mIoU与0.0148的外Chamfer距离,优于IPNet(88.6% mIoU,0.0151 CD),且参数量减少46%。
- PTF-FC仅使用IPNet 64%的参数量,却实现了mIoU提升2.6%、外Chamfer距离降低2.6%。
- 该方法在BUFF数据集的原始扫描上泛化良好,无需微调即可生成高质量的注册SMPL网格。
- 消融实验表明,若将PTF替换为4层MLP(TF-FC),性能显著下降,证实了PTF模块的必要性。
- 通过随机输入旋转进行数据增强可提升性能,验证了其在增强鲁棒性与泛化能力方面的作用。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。