QUICK REVIEW

[论文解读] One-shot Face Reenactment

Yunxuan Zhang, Siwei Zhang|arXiv (Cornell University)|Aug 5, 2019

Face recognition and analysis参考文献 40被引用 28

一句话总结

该论文提出了一种单图像人脸重演框架，通过使用独立编码器分离外观与形状特征，并共享解码器，实现了高保真度的身份保留和从单张目标图像中获得的逼真形状迁移。该方法在身份保留（野外数据集上达到98.2%）和姿态/动作单元一致性方面达到最先进性能，尽管每名受试者仅使用一张参考图像，仍优于多图像基线方法。

ABSTRACT

To enable realistic shape (e.g. pose and expression) transfer, existing face reenactment methods rely on a set of target faces for learning subject-specific traits. However, in real-world scenario end-users often only have one target face at hand, rendering existing methods inapplicable. In this work, we bridge this gap by proposing a novel one-shot face reenactment learning framework. Our key insight is that the one-shot learner should be able to disentangle and compose appearance and shape information for effective modeling. Specifically, the target face appearance and the source face shape are first projected into latent spaces with their corresponding encoders. Then these two latent spaces are associated by learning a shared decoder that aggregates multi-level features to produce the final reenactment results. To further improve the synthesizing quality on mustache and hair regions, we additionally propose FusionNet which combines the strengths of our learned decoder and the traditional warping method. Extensive experiments show that our one-shot face reenactment system achieves superior transfer fidelity as well as identity preserving capability than alternatives. More remarkably, our approach trained with only one target image per subject achieves competitive results to those using a set of target images, demonstrating the practical merit of this work. Code, models and an additional set of reenacted faces have been publicly released at the project page.

研究动机与目标

解决仅有一张目标身份参考图像时实现逼真人脸重演的挑战。
克服现有方法需要多张图像或长时间视频序列进行训练的局限性。
通过在潜在空间中解耦外观与形状表征，实现身份保持的人脸重演。
通过混合变形-解码器融合机制，提升胡须和头发等困难区域的生成质量。
在仅对每名受试者使用一张图像进行训练的前提下，实现与全参考方法相当的性能表现。

提出的方法

使用专用编码器将目标人脸外观与源人脸形状分别嵌入到独立的潜在空间中。
训练一个共享解码器，通过融合来自外观和形状潜在表示的多层次特征来重建重演结果。
通过自编码分支（重建）和重演分支联合监督外观编码器，以保持身份与纹理特征。
引入FusionNet，将学习到的解码器输出与传统基于变形的输出结果相结合，提升头发和胡须等细粒度区域的逼真度。
使用加权损失函数，结合外观重建与重演监督，其中超参数 λ 控制重建任务的强调程度。
在推理阶段应用空间自适应归一化，以对齐不同身份之间的特征，从而实现跨身份重演。

实验结果

研究问题

RQ1当每名受试者仅使用一张图像进行训练时，人脸重演模型能否实现高保真度的身份保留？
RQ2在单图像设置下，如何有效解耦并组合外观与形状表征，以维持面部身份与表情迁移？
RQ3将深度学习生成方法与传统变形技术相结合，对纹理和头发区域的质量有何影响？
RQ4性能随图像数量增加（如单图像 vs. 少量图像）如何变化？
RQ5在仅使用一张图像进行训练的情况下，模型能否达到与使用多张参考图像训练的模型相当的性能？

主要发现

所提方法在野外测试数据上实现了98.2%的身份保留率，优于最先进的一张图像方法，并且在仅使用一张参考图像的情况下与 GANimation 的性能相当。
在野外数据上，模型实现了71.1%的动作单元一致性与2.63的姿势一致性，表明其具有出色的表达与姿态迁移保真度。
与无融合的基线相比，FusionNet 平均将身份保留率提升了8.1%，尽管由于更关注纹理质量，AU 一致性略有下降。
消融实验表明，将外观编码器与空间自适应解码器的特征拼接，可使身份保留率提升11.4个百分点（从77.7%提升至89.1%）。
在使用3张图像和5张图像时，身份保留率分别提升至99.3%和99.4%，表明性能随数据量增加而提升，但单图像性能依然极具竞争力。
在跨源数据上，仅使用一张图像训练的模型实现了89.1%的身份保留率，显著优于依赖多图像监督的其他方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。