QUICK REVIEW

[论文解读] ExprGAN: Facial Expression Editing with Controllable Expression Intensity

Hui Ding, Kumar Sricharan|arXiv (Cornell University)|Sep 12, 2017

Face recognition and analysis被引用 82

一句话总结

ExprGAN 使面部表情可连续控制强度地编辑为目标表情，产生照片级真实感的结果，并实现身份/表情的解耦表示。它还支持表情转移和用于表情识别的数据增强。

ABSTRACT

Facial expression editing is a challenging task as it needs a high-level semantic understanding of the input face image. In conventional methods, either paired training data is required or the synthetic face resolution is low. Moreover, only the categories of facial expression can be changed. To address these limitations, we propose an Expression Generative Adversarial Network (ExprGAN) for photo-realistic facial expression editing with controllable expression intensity. An expression controller module is specially designed to learn an expressive and compact expression code in addition to the encoder-decoder network. This novel architecture enables the expression intensity to be continuously adjusted from low to high. We further show that our ExprGAN can be applied for other tasks, such as expression transfer, image retrieval, and data augmentation for training improved face expression recognition models. To tackle the small size of the training database, an effective incremental learning scheme is proposed. Quantitative and qualitative evaluations on the widely used Oulu-CASIA dataset demonstrate the effectiveness of ExprGAN.

研究动机与目标

在不受限的表情类别或成对数据的情况下，推动面部表情编辑。
开发一个带有表达控制器的编码器–解码器 GAN，产生连续、可控的表情编码。
解耦身份和表情表示，便于表情转移和检索等多种应用。
通过双鉴别器和感知损失提升真实感，并通过增量训练应对小数据集。

提出的方法

使用编码器将输入面部映射到一个身份保持的潜在编码 g(x)。
引入表达控制模块 F_ctrl，将一 hot 表达标签 y 转换为连续的表达编码 c。
通过正则化器 Q 提高生成图像与表达编码之间的互信息，鼓励 c 的每个维度捕捉不同的强度因子。
用 G_dec 以 g(x) 和 c 为条件生成图像，并通过 D_img 强制逼真性，利用预训练人脸模型的特征损失 L_id 来保持身份。
用 D_z 对 g(x) 加入潜在空间先验，确保身份表示覆盖流形。
使用包含像素、身份、Q、对抗和总变差损失的复合目标 L_ExprGAN，采用三阶段增量学习调度进行训练。

实验结果

研究问题

RQ1在面部表情编辑中是否可以在没有明确强度标签的情况下实现连续表达强度的控制？
RQ2模型是否能解耦身份与表情，使表情编辑或转移在保持身份的同时完成？
RQ3ExprGAN 在高质量图像合成和用于表情识别的数据增强方面表现如何？
RQ4该方法是否能够在每个表达类别内生成多样的表情风格？
RQ5在小数据集上进行有效学习是否需要增量式训练？

主要发现

合成图像数	准确率（%）
0	77.78
3K	78.47
6K	81.94
30K	84.72
60K	84.72

ExprGAN 可以将人脸编辑为多种表情，强度可连续调节，包括训练数据中无中性的表达。
模型在应用新表情时保持身份，纹理和细节逼真。
对不同身份进行表情转移是可行的，产生目标表情下的源身份。
生成的图像可用于数据增强，提升表情识别准确性（例如，30K 合成图像可达到 84.72%）。
身份表示 g(x) 在潜在空间中分离良好，表达编码 c 使在特征空间中检索到相似表情成为可能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。