QUICK REVIEW

[论文解读] Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation

Lingjie Liu, Weipeng Xu|arXiv (Cornell University)|Jan 14, 2020

Advanced Vision and Imaging参考文献 90被引用 49

一句话总结

一个三阶段的神经网络管线将纹理空间中与姿态相关的细粒度细节与二维屏幕空间嵌入分离，以合成时序连贯、高保真的人体视频。它使用 TexNet 在 UV 空间生成动态纹理，且利用 RefNet 渲染并精炼最终视频。

ABSTRACT

Synthesizing realistic videos of humans using neural networks has been a popular alternative to the conventional graphics-based rendering pipeline due to its high efficiency. Existing works typically formulate this as an image-to-image translation problem in 2D screen space, which leads to artifacts such as over-smoothing, missing body parts, and temporal instability of fine-scale detail, such as pose-dependent wrinkles in the clothing. In this paper, we propose a novel human video synthesis method that approaches these limiting factors by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space. More specifically, our method relies on the combination of two convolutional neural networks (CNNs). Given the pose information, the first CNN predicts a dynamic texture map that contains time-coherent high-frequency details, and the second CNN conditions the generation of the final video on the temporally coherent output of the first CNN. We demonstrate several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively.

研究动机与目标

促进神经人体视频合成的现实感提升，超越二维图像翻译中的伪轮廓模糊和时序不稳定等伪影。
通过利用纹理空间学习，将时间一致的细粒度细节从二维姿态嵌入中解耦。
开发一个两网络系统（TexNet 和 RefNet）以生成动态纹理并精炼渲染输出。
使运动迁移、交互式重演以及从单目视频进行的新视角合成等应用成为可能。
提供一个保持服装和身体外观在空间、时间和几何一致性的管线。

提出的方法

结合纹理空间学习与图像空间细化的三阶段管线。
TexNet 从 UV 空间中的部分法线图学习与姿态相关、时间一致的高频纹理细节。
部分动态纹理通过性能捕捉网格从单目视频帧进行回投影。
第二个网络完成并渲染带纹理的网格，以产生一致的纹理空间合成。
RefNet 对渲染的基于纹理的网格输出进行精炼，以生成包含阴影及前景/背景交互的最终真实感视频。
训练使用基于条件对抗网络的目标函数，结合帧损失、视频损失和光流一致性。

实验结果

研究问题

RQ1将纹理空间的动力学与屏幕空间嵌入解耦，是否可以提升神经人体视频合成的时序连贯性和细节？
RQ2如何从单目数据中学习 UV 空间中的动态纹理并将其应用于以姿态驱动的渲染？
RQ3两网络精炼（TexNet + RefNet）是否在运动迁移和新视角合成方面优于以往的二维图像到图像翻译方法？
RQ4使用部分法线图作为姿态编码对纹理合成的质量与稳定性有何影响？

主要发现

这种三阶段方法能够产生时序连贯、随服装移动的高频细节，如皱纹。
TexNet 在 UV 空间生成姿态相关的纹理，使得无需逐帧重新合成即可实现精准细节。
RefNet 能有效地混合前景/背景、捕捉阴影并纠正几何误差，从而提高真实感。
该方法在运动迁移、交互式重演以及单目新视角合成方面提供了比最新方法更优的定性和定量结果。
纹理空间学习减少了之前基于帧的图像翻译常见的缺肢、轮廓错误等二维合成伪影。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。