QUICK REVIEW

[论文解读] MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images

Shaofei Wang, Marko Mihajlović|arXiv (Cornell University)|Jun 22, 2021

3D Shape Modeling and Analysis参考文献 71被引用 48

一句话总结

MetaAvatar 学习一个元学习的超网络，预测衣着人体的动态神经 SDF，从少量单目深度图像出发，实现快速、可控的人体头像，具备随姿势变化的衣物变形。

ABSTRACT

In this paper, we aim to create generalizable and controllable neural signed distance fields (SDFs) that represent clothed humans from monocular depth observations. Recent advances in deep learning, especially neural implicit representations, have enabled human shape reconstruction and controllable avatar generation from different sensor inputs. However, to generate realistic cloth deformations from novel input poses, watertight meshes or dense full-body scans are usually needed as inputs. Furthermore, due to the difficulty of effectively modeling pose-dependent cloth deformations for diverse body shapes and cloth types, existing approaches resort to per-subject/cloth-type optimization from scratch, which is computationally expensive. In contrast, we propose an approach that can quickly generate realistic clothed human avatars, represented as controllable neural SDFs, given only monocular depth images. We achieve this by using meta-learning to learn an initialization of a hypernetwork that predicts the parameters of neural SDFs. The hypernetwork is conditioned on human poses and represents a clothed neural avatar that deforms non-rigidly according to the input poses. Meanwhile, it is meta-learned to effectively incorporate priors of diverse body shapes and cloth types and thus can be much faster to fine-tune, compared to models trained from scratch. We qualitatively and quantitatively show that our approach outperforms state-of-the-art approaches that require complete meshes as inputs while our approach requires only depth frames as inputs and runs orders of magnitudes faster. Furthermore, we demonstrate that our meta-learned hypernetwork is very robust, being the first to generate avatars with realistic dynamic cloth deformations given as few as 8 monocular depth frames.

研究动机与目标

促使可推广的衣着人体建模，支持随姿态变化的衣料变形。
开发一个元学习框架，用以初始化一个超网络，使其输出基于身体姿态条件的动态神经 SDF。
实现从单目深度图像的快速微调，以创建针对特定主体的可动画头像。
与网格输入基线进行比较，并在极少量数据下展示鲁棒性。

提出的方法

使用元学习的超网络 g_ψ 来预测在规范空间中的衣着人体的神经 SDF 参数。
通过少-shot 元学习（IGR 损失）对静态神经 SDF f_φ 进行元学习，以捕捉跨身体形状和衣着的先验。
训练一个超网络，在受到骨骼变换 {B_b} 条件约束时，预测静态元 SDF 参数的残差。
使用逆蒙皮网络对输入深度点进行规范化；然后通过前向蒙皮网络进行动画。
在测试时用少量深度帧和 SMPL 拟合集对超网络进行优化。
用分层 MLP 编码器对 g_ψ 编码骨骼变换。

实验结果

研究问题

RQ1 一个元学习的先验能否从深度图像快速、少量样本生成可动画的衣着人体头像？
RQ2 基于超网络的动态 SDF 模型在有限输入数据的情况下，对受试者、衣服类型和姿态的泛化能力如何？
RQ3 相比元学习静态 SDF 先验与元学习动态超网络在姿态相关的衣物变形方面，有何优点？
RQ4 当仅有深度帧可用时，MetaAvatar 相对于网格基线或全输入基线的表现如何？

主要发现

MetaAvatar 仅需最少 8 帧深度图，在约 2 分钟的微调内即可生成可控的动态神经 SDF 头像。
该方法优于需要完整网格的基线（NASA、SCANimate），在深度输入下也接近 LEAP，并提供更好的衣物细节。
两阶段元学习策略（静态元 SDF 初始化，随后的超网络残差）稳定学习并实现快速适应。
在 CAPE 实验中学习得到的变形先验对未见主体、未见姿势以及至少一种未见衣物类型具有鲁棒性。
在微调数据减少到 <1% 时，MetaAvatar 仍保持具有竞争力的感知质量和合理的插值/外推性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。