[论文解读] A Multi-Task Learning & Generation Framework: Valence-Arousal, Action Units & Primary Expressions.
本文提出了一种多任务学习与生成框架,用于使用大规模野外数据集联合预测愉悦度-唤醒度(VA)、面部动作单元(AUs)和基本面部表情。该研究在Aff-Wild数据集的部分样本上引入了新的标注,并采用共享的深度神经网络结合基于GAN的生成器与判别器,通过任务特定损失函数的联合优化,实现了最先进性能。
Over the past few years many research efforts have been devoted to the field of affect analysis. Various approaches have been proposed for: i) discrete emotion recognition in terms of the primary facial expressions; ii) emotion analysis in terms of facial Action Units (AUs), assuming a fixed expression intensity; iii) dimensional emotion analysis, in terms of valence and arousal (VA). These approaches can only be effective, if they are developed using large, appropriately annotated databases, showing behaviors of people in-the-wild, i.e., in uncontrolled environments. Aff-Wild has been the first, large-scale, in-the-wild database (including around 1,200,000 frames of 300 videos), annotated in terms of VA. In the vast majority of existing emotion databases, their annotation is limited to either primary expressions, or valence-arousal, or action units. In this paper, we first annotate a part (around $234,000$ frames) of the Aff-Wild database in terms of $8$ AUs and another part (around $288,000$ frames) in terms of the $7$ basic emotion categories, so that parts of this database are annotated in terms of VA, as well as AUs, or primary expressions. Then, we set up and tackle multi-task learning for emotion recognition, as well as for facial image generation. Multi-task learning is performed using: i) a deep neural network with shared hidden layers, which learns emotional attributes by exploiting their inter-dependencies; ii) a discriminator of a generative adversarial network (GAN). On the other hand, image generation is implemented through the generator of the GAN. For these two tasks, we carefully design loss functions that fit the examined set-up. Experiments are presented which illustrate the good performance of the proposed approach when applied to the new annotated parts of the Aff-Wild database.
研究动机与目标
- 解决影响分析领域缺乏大规模、多标注的野外数据集的问题。
- 通过共享表征实现愉悦度-唤醒度、面部动作单元和基本面部表情的联合学习。
- 开发一种生成对抗网络(GAN)框架,以增强数据多样性并提升模型泛化能力。
- 通过利用多种情绪表征任务之间的相互依赖关系,提升影响识别性能。
提出的方法
- 对Aff-Wild数据集中的234,000帧图像标注了8个动作单元,对288,000帧图像标注了7种基本面部表情。
- 设计了一个具有共享隐藏层的深度神经网络,以联合学习VA、AUs和基本表情。
- 整合了GAN框架,其中生成器生成逼真的面部图像,判别器执行多任务分类/回归。
- 构建了复合损失函数,结合皮尔逊等级相关系数(CCC)用于VA,交叉熵用于AUs与表情分类,均方误差(MSE)用于回归。
- 将GAN判别器用作多任务分类器与回归器,从而获得半监督学习的优势。
- 通过调整学习率和损失权重系数(α, β)等超参数,平衡各任务的贡献。
实验结果
研究问题
- RQ1与单任务学习相比,联合多任务学习是否能提升愉悦度-唤醒度、动作单元和基本表情预测的性能?
- RQ2基于GAN的生成器在影响识别中如何提升数据质量与模型泛化能力?
- RQ3在共享表征背景下,多任务影响识别的最优损失函数组合是什么?
- RQ4GAN判别器在多任务分类中用于VA、AUs和基本表情预测的效用有多大?
- RQ5不同损失函数组合与超参数如何影响所有任务的最终性能?
主要发现
- 最佳模型在愉悦度上的皮尔逊等级相关系数(CCC)为0.616,唤醒度为0.510,加权F1得分为0.643,总准确率为0.645。
- 当α=β=0.5时,多任务模型优于单任务基线(仅VA:CCC=0.579;仅表情:F1=0.488),证明了联合学习的优势。
- 采用基于CCC的损失用于VA、交叉熵用于表情分类,学习率为10−3时,在所有指标上均取得最高性能。
- 当联合回归VA与分类AUs时,GAN判别器的总准确率达到0.667,优于单任务配置。
- 生成器成功学习了野外环境特征(如姿态变化、光照条件、遮挡),生成了逼真的图像,丰富了训练数据。
- 与单任务学习相比,α=β=0.5的模型在表情预测上的F1得分提高了6.7%,总准确率提高了10.5%。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。