[论文解读] DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes
DeVRF 引入了一个可变形的体素辐射场,它学习一个 3D canonical space 以及一个 4D deformation field,从而实现对动态场景约 100× 的更快训练速度,且质量相近。它采用 static→dynamic 学习范式与粗到细的优化,并包含若干正则化。
Modeling dynamic scenes is important for many applications such as virtual reality and telepresence. Despite achieving unprecedented fidelity for novel view synthesis in dynamic scenes, existing methods based on Neural Radiance Fields (NeRF) suffer from slow convergence (i.e., model training time measured in days). In this paper, we present DeVRF, a novel representation to accelerate learning dynamic radiance fields. The core of DeVRF is to model both the 3D canonical space and 4D deformation field of a dynamic, non-rigid scene with explicit and discrete voxel-based representations. However, it is quite challenging to train such a representation which has a large number of model parameters, often resulting in overfitting issues. To overcome this challenge, we devise a novel static-to-dynamic learning paradigm together with a new data capture setup that is convenient to deploy in practice. This paradigm unlocks efficient learning of deformable radiance fields via utilizing the 3D volumetric canonical space learnt from multi-view static images to ease the learning of 4D voxel deformation field with only few-view dynamic sequences. To further improve the efficiency of our DeVRF and its synthesized novel view's quality, we conduct thorough explorations and identify a set of strategies. We evaluate DeVRF on both synthetic and real-world dynamic scenes with different types of deformation. Experiments demonstrate that DeVRF achieves two orders of magnitude speedup (100x faster) with on-par high-fidelity results compared to the previous state-of-the-art approaches. The code and dataset will be released in https://github.com/showlab/DeVRF.
研究动机与目标
- 推动对非刚性动态场景的快速、写真级的新视图合成。
- 提出基于体素的表示,用于 3D canonical space 和 4D deformation field。
- 证明 static→dynamic 学习范式在少视图动态序列中能提高训练效率。
- 开发优化策略以防止过拟合并提升动态辐射场的重建保真度。
- 在合乎实用的采集方案下,展示对合成和真实世界动态场景的显著训练加速。
提出的方法
- 用从多视图静态图像学习的 3D volumetric canonical space(density and color voxels)来建模场景。
- 使用 4D voxel deformation field 表示运动,并进行 quadruple interpolation 将动态点映射到 canonical space。
- 采用 static→dynamic 学习范式将 3D canonical prior 转移到 4D deformation field。
- 对 4D deformation field 采用 coarse-to-fine 训练策略以提升优化效率。
- 强制 deformation cycle consistency、光流监督,以及 total variation regularization,以提升保真度和稳定性。
- 使用 photometric rendering loss plus auxiliary losses 来正则化运动并确保光滑性。
实验结果
研究问题
- RQ1Can a static canonical prior accelerate learning of deformable radiance fields for dynamic scenes?
- RQ2How can a 4D voxel deformation field be efficiently learned without overfitting given many parameters?
- RQ3What optimization strategies best balance training speed and reconstruction fidelity for dynamic NeRFs?
主要发现
- DeVRF achieves about 100× faster training compared to state-of-the-art approaches while delivering on-par high-fidelity results.
- Training can be completed in roughly 10 minutes on a single RTX 3090 GPU using four-camera capture setups.
- A 3D volumetric canonical space learned from static multi-view data is effective as a prior to learn 4D voxel deformations.
- Coarse-to-fine optimization plus deformation cycle consistency, optical-flow supervision, and TV regularization significantly improve efficiency and quality.
- DeVRF demonstrates strong performance on synthetic inward-facing scenes and real-world deformable scenes with various deformation types.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。