QUICK REVIEW

[论文解读] Self-Learning Transformations for Improving Gaze and Head Redirection

Yufeng Zheng, Seonwook Park|arXiv (Cornell University)|Oct 23, 2020

Face recognition and analysis参考文献 58被引用 24

一句话总结

该论文提出了一种自变换编码器-解码器架构，通过潜在瓶颈中的自预测伪条件，解耦并控制多种面部因素（包括视线方向、头部朝向、光照和色相）。该方法在视线和头部重定向保真度方面达到最先进性能，并通过使用重定向图像增强真实数据，提升了半监督跨数据集视线估计的性能。

ABSTRACT

Many computer vision tasks rely on labeled data. Rapid progress in generative modeling has led to the ability to synthesize photorealistic images. However, controlling specific aspects of the generation process such that the data can be used for supervision of downstream tasks remains challenging. In this paper we propose a novel generative model for images of faces, that is capable of producing high-quality images under fine-grained control over eye gaze and head orientation angles. This requires the disentangling of many appearance related factors including gaze and head orientation but also lighting, hue etc. We propose a novel architecture which learns to discover, disentangle and encode these extraneous variations in a self-learned manner. We further show that explicitly disentangling task-irrelevant factors results in more accurate modelling of gaze and head orientation. A novel evaluation scheme shows that our method improves upon the state-of-the-art in redirection accuracy and disentanglement between gaze direction and head orientation changes. Furthermore, we show that in the presence of limited amounts of real-world training data, our method allows for improvements in the downstream task of semi-supervised cross-dataset gaze estimation. Please check our project page at: https://ait.ethz.ch/projects/2020/STED-gaze/

研究动机与目标

解决在无配对数据的情况下，对野外图像中视线和头部朝向等细粒度面部属性进行控制的挑战。
以自监督方式将与任务相关的因素（视线、头部姿态）与与任务无关的因素（光照、色相等）解耦。
设计一种系统化的评估方案，用于衡量重定向准确性和解耦保真度。
通过使用所提出的重定向框架增强有限的真实世界训练数据，提升半监督跨数据集视线估计的性能。

提出的方法

提出一种具有多个可变换潜在因子的自变换编码器-解码器（ST-ED）架构，每个潜在因子由潜在嵌入和自预测伪条件组成。
利用自预测伪条件减少在条件图像转换过程中对嘈杂或不完美标签的依赖。
应用新颖的约束条件，以在保持对目标视线和头部朝向精确控制的同时，强制实现各独立因素之间的解耦。
引入重定向误差度量，量化生成图像中目标视线和头部朝向的重现准确性。
引入任务解耦误差度量，用于衡量当非相关因素改变时，视线或头部朝向的变化程度。
使用有限的真实数据以半监督方式训练视线重定向模型，随后利用该模型增强下游视线估计任务的训练数据。

实验结果

研究问题

RQ1自监督生成模型是否能有效在无配对监督的情况下，解耦并控制包括视线、头部朝向、光照和色相在内的多种面部因素？
RQ2显式解耦与任务无关的因素在多大程度上提升了视线和头部重定向的准确性和保真度？
RQ3通过该方法生成的重定向图像在多大程度上能提升半监督跨数据集视线估计的性能？
RQ4所提出的评估方案与现有度量相比，在衡量重定向保真度和解耦性方面表现如何？

主要发现

所提方法在GazeCapture数据集上的定性和定量评估中，均优于He等人[17]和StarGAN[21]，实现了最先进水平的重定向准确性。
与次优基线相比，该方法将重定向误差降低了25%，证明了其在目标视线和头部朝向控制方面的优越性。
任务解耦误差度量显示，当光照和色相发生变化时，模型仍能保持视线和头部朝向的稳定性，证实了解耦的有效性。
在半监督跨数据集视线估计中，通过使用重定向图像增强真实训练数据，该方法在四个基准数据集上将平均绝对误差（MAE）最高降低了15%。
该模型在大头部姿态、佩戴眼镜和模糊输入等挑战性情况下也表现出良好的泛化能力，生成了逼真的图像输出。
消融研究证实，对非相关因素的解耦可带来更准确和鲁棒的重定向效果，验证了核心设计原则的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。