[论文解读] Learning Disentangled Avatars with Hybrid 3D Representations
DELTA 通过将显式网格表示的身体/脸部与隐式 NeRF 表示的头发和服装相结合,学习可解耦的化身,并从单目视频端到端训练。
Tremendous efforts have been made to learn animatable and photorealistic human avatars. Towards this end, both explicit and implicit 3D representations are heavily studied for a holistic modeling and capture of the whole human (e.g., body, clothing, face and hair), but neither representation is an optimal choice in terms of representation efficacy since different parts of the human avatar have different modeling desiderata. For example, meshes are generally not suitable for modeling clothing and hair. Motivated by this, we present Disentangled Avatars~(DELTA), which models humans with hybrid explicit-implicit 3D representations. DELTA takes a monocular RGB video as input, and produces a human avatar with separate body and clothing/hair layers. Specifically, we demonstrate two important applications for DELTA. For the first one, we consider the disentanglement of the human body and clothing and in the second, we disentangle the face and hair. To do so, DELTA represents the body or face with an explicit mesh-based parametric 3D model and the clothing or hair with an implicit neural radiance field. To make this possible, we design an end-to-end differentiable renderer that integrates meshes into volumetric rendering, enabling DELTA to learn directly from monocular videos without any 3D supervision. Finally, we show that how these two applications can be easily combined to model full-body avatars, such that the hair, face, body and clothing can be fully disentangled yet jointly rendered. Such a disentanglement enables hair and clothing transfer to arbitrary body shapes. We empirically validate the effectiveness of DELTA's disentanglement by demonstrating its promising performance on disentangled reconstruction, virtual clothing try-on and hairstyle transfer. To facilitate future research, we also release an open-sourced pipeline for the study of hybrid human avatar modeling.
研究动机与目标
- 通过选择分部表示来提高效率与真实感,从而激发对逼真化身的学习。
- 实现面部/身体与头发的解耦,并进一步将头发与服装分离解耦,以实现灵活编辑。
- 证明混合网格-NeRF 渲染器可以在没有3D监督的情况下从单目视频端到端训练。
- 展示解耦化身,使头发和服装可转移到任意身体形状。
提出的方法
- 用显式网格模型(SMPL-X)表示面部和身体,以获得可控的几何形状和姿态。
- 用隐式 NeRF 表示头发和服装,以灵活且高保真的方式渲染复杂拓扑。
- 引入一种网格集成的体积渲染,将网格光栅化与 NeRF 着色结合。
- 使用 SMPL-X 蒙皮和学习到的非刚性形变的组合来建模外层的与姿态相关的变形。
- 从单目 RGB 视频端到端训练整个管道,使用可微渲染且无需3D监督。

实验结果
研究问题
- RQ1是否可以通过使用不同的3D表示来实现身体与服装的解耦(身体用网格,服装用 NeRF)?
- RQ2是否可以通过将网格分配给脸部、将 NeRF 分配给头发来实现面部与头发的解耦?
- RQ3在不依赖辅助3D监督的情况下,端到端从单目视频学习在保持高保真度的同时是否可行?
- RQ4在该框架下学习的头发和服装能否转移到新的身体形状和姿势?
主要发现
- DELTA 在一个统一框架内实现了头部(脸部与头发)和身体(身体与服装)的解耦重建。
- 通过 NeRF 渲染的头发和衣物为非刚性几何提供高保真细节,而身体和脸部仍可通过 SMPL-X 轻松动画。
- 一种新颖的可微分网格集成体积渲染器使直接从单目视频进行端到端学习成为可能。
- 解耦表示允许头发和服装转移到任意身体形状和姿势。
- 该方法在解耦重建、虚拟服装试穿以及发型转移方面显示出良好的性能。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。