[论文解读] Animatable Neural Radiance Fields from Monocular RGB Videos
该论文将基于 SMPL 的姿态引导变形与 NeRF 相结合,从单目 RGB 视频重建并渲染可动画的3D人体头像,联合优化 SMPL 参数与 NeRF 以提升细节并实现新的姿态动画。
We present animatable neural radiance fields (animatable NeRF) for detailed human avatar creation from monocular videos. Our approach extends neural radiance fields (NeRF) to the dynamic scenes with human movements via introducing explicit pose-guided deformation while learning the scene representation network. In particular, we estimate the human pose for each frame and learn a constant canonical space for the detailed human template, which enables natural shape deformation from the observation space to the canonical space under the explicit control of the pose parameters. To compensate for inaccurate pose estimation, we introduce the pose refinement strategy that updates the initial pose during the learning process, which not only helps to learn more accurate human reconstruction but also accelerates the convergence. In experiments we show that the proposed approach achieves 1) implicit human geometry and appearance reconstruction with high-quality details, 2) photo-realistic rendering of the human from novel views, and 3) animation of the human with novel poses.
研究动机与目标
- Motivate high-quality 3D human reconstruction from monocular RGB video without expensive equipment.
- Propose an explicit SMPL-guided deformation to learn a canonical NeRF space for a detailed human template.
- Jointly optimize NeRF and SMPL parameters to improve convergence and reconstruction accuracy.
- Enable novel view rendering and novel pose animation of the reconstructed human.
提出的方法
- Introduce animatable NeRF that maps 3D position, SMPL shape and pose to color and density in a canonical space.
- Use pose-guided deformation to transform observation-space points to canonical-space using a weighted combination of nearby SMPL vertices (blend skinning).
- Apply volume rendering with a geometric prior via an 3D mask to render images from neural radiance fields.
- Jointly optimize NeRF parameters and SMPL parameters via analysis-by-synthesis, with pose refinement to correct SMPL estimates during training.
- Incorporate background regularization and pose regularization to stabilize optimization.
实验结果
研究问题
- RQ1Can SMPL-guided deformation enable a controllable, animatable NeRF learned from monocular video?
- RQ2Does joint optimization of NeRF and SMPL parameters improve 3D geometry and appearance quality compared to baselines?
- RQ3Is pose refinement necessary to achieve robust reconstruction and animation under monocular inputs?
- RQ4What is the impact of canonical pose choice and background regularization on reconstruction and novel pose synthesis?
- RQ5How well does the proposed method support novel view synthesis and novel pose synthesis of the reconstructed human?
主要发现
| 被试ID | NeRF PSNR | SMPLpix PSNR | NB PSNR | NeRF+U PSNR | OURS PSNR | NeRF SSIM | SMPLpix SSIM | NB SSIM | NeRF+U SSIM | OURS SSIM | NeRF LIPIS | SMPLpix LIPIS | NB LIPIS | NeRF+U LIPIS | OURS LIPIS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| male-3-casual | 20.64 | 23.74 | 24.94 | 23.88 | 29.37 | .8993 | .9229 | .9428 | .9329 | .9703 | .1008 | .0222 | .0326 | .0438 | .0168 |
| male-4-casual | 20.29 | 22.43 | 24.71 | 23.13 | 28.37 | .8803 | .9095 | .9469 | .9276 | .9605 | .1445 | .0305 | .0423 | .0554 | .0268 |
| female-3-casual | 17.43 | 22.33 | 23.87 | 22.45 | 28.91 | .8605 | .9288 | .9504 | .9413 | .9743 | .1696 | .0270 | .0346 | .0498 | .0215 |
| female-4-casual | 17.63 | 23.35 | 24.37 | 23.13 | 28.90 | .8578 | .9258 | .9451 | .9276 | .9678 | .1827 | .0239 | .0382 | .0556 | .0174 |
| iper-009-4-1 | 19.54 | 20.25 | 25.46 | 21.56 | 30.23 | .7870 | .9018 | .9378 | .8667 | .9466 | .2641 | .0293 | .0558 | .1197 | .0335 |
| iper-023-1-1 | 17.41 | 19.48 | 25.44 | 20.25 | 27.26 | .7623 | .8945 | .9330 | .8656 | .9457 | .2769 | .0442 | .0493 | .1109 | .0285 |
| iper-002-1-1 | 16.01 | 19.64 | 23.06 | 18.75 | 26.99 | .7500 | .8886 | .9394 | .8708 | .9502 | .3363 | .0392 | .0476 | .1205 | .0285 |
| iper-026-1-1 | 17.09 | 19.03 | 23.77 | 18.48 | 26.85 | .7580 | .8574 | .9351 | .8623 | .9542 | .2928 | .0494 | .0550 | .1282 | .0315 |
- Achieves high-quality implicit geometry and appearance for humans from monocular video with observable details like clothing wrinkles and hair.
- Enables photo-realistic novel-view rendering of the animated human by leveraging the canonical NeRF space.
- Produces superior 3D reconstruction metrics (lower P2S and Chamfer) than NeRF variants without explicit pose guidance.
- Outperforms NeuralBody and SMPLpix in novel-pose synthesis on iPER and People-Snapshot datasets.
- Pose refinement during training substantially improves rendering quality when SMPL estimates are imperfect.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。