QUICK REVIEW

[論文レビュー] A Multi-Task Learning & Generation Framework: Valence-Arousal, Action Units & Primary Expressions.

Dimitrios Kollias, Stefanos Zafeiriou|arXiv (Cornell University)|Nov 11, 2018

Emotion and Mood Recognition参考文献 39被引用数 46

ひとこと要約

本稿では、大規模かつ実世界の状況を想定したデータセットを用いて、感情のValence-Arousal（VA）、顔面のAction Units（AUs）、基本的な顔の表情の共同予測を目的としたマルチタスク学習および生成フレームワークを提案する。Aff-Wildデータセットの一部に新たなアノテーションを追加し、GANベースの生成器と識別器を備えた共有ディープニューラルネットワークを採用することで、タスク固有の損失関数を用いた共同最適化により、最先端の性能を達成した。

ABSTRACT

Over the past few years many research efforts have been devoted to the field of affect analysis. Various approaches have been proposed for: i) discrete emotion recognition in terms of the primary facial expressions; ii) emotion analysis in terms of facial Action Units (AUs), assuming a fixed expression intensity; iii) dimensional emotion analysis, in terms of valence and arousal (VA). These approaches can only be effective, if they are developed using large, appropriately annotated databases, showing behaviors of people in-the-wild, i.e., in uncontrolled environments. Aff-Wild has been the first, large-scale, in-the-wild database (including around 1,200,000 frames of 300 videos), annotated in terms of VA. In the vast majority of existing emotion databases, their annotation is limited to either primary expressions, or valence-arousal, or action units. In this paper, we first annotate a part (around $234,000$ frames) of the Aff-Wild database in terms of $8$ AUs and another part (around $288,000$ frames) in terms of the $7$ basic emotion categories, so that parts of this database are annotated in terms of VA, as well as AUs, or primary expressions. Then, we set up and tackle multi-task learning for emotion recognition, as well as for facial image generation. Multi-task learning is performed using: i) a deep neural network with shared hidden layers, which learns emotional attributes by exploiting their inter-dependencies; ii) a discriminator of a generative adversarial network (GAN). On the other hand, image generation is implemented through the generator of the GAN. For these two tasks, we carefully design loss functions that fit the examined set-up. Experiments are presented which illustrate the good performance of the proposed approach when applied to the new annotated parts of the Aff-Wild database.

研究の動機と目的

感情分析の分野において、大規模かつマルチアノテートされた実世界のデータセットが不足している問題に対処すること。
共有表現を用いて、Valence-Arousal、顔面のAction Units、基本的顔の表情の共同学習を可能にすること。
データの多様性とモデルの汎化性能を向上させるために、生成的対抗ネットワーク（GAN）フレームワークを開発すること。
複数の感情表現タスク間の相関関係を活用することで、感情認識の性能を向上させること。

提案手法

Aff-Wildの234,000フレームを8つのAction Units、288,000フレームを7つの基本的顔の表情についてアノテートした。
VA、AUs、基本的顔の表情を同時に学習できる共有隠れ層を備えたディープニューラルネットワークを設計した。
生成器が現実的な顔画像を生成し、識別器がマルチタスク分類／回帰を実行するGANフレームワークを統合した。
VAのためのCCC、AUsおよび表情分類のための交差エントロピー、回帰のためのMSEを組み合わせた複合損失関数を定式化した。
GANの識別器をマルチタスク分類器および回帰器として使用することで、半教師あり学習の利点を活用した。
学習率や損失関数の重み係数（α, β）を含むハイパーパrameterを最適化し、タスク間の寄与度をバランスさせた。

実験結果

リサーチクエスチョン

RQ1共有表現を用いたマルチタスク学習により、単一タスク学習と比較して、Valence-Arousal、Action Unit、基本的顔の表情予測の性能が向上するか？
RQ2GANベースの生成器の導入が、感情認識におけるデータ品質とモデルの汎化性能をどのように向上させるか？
RQ3共有表現の文脈において、マルチタスク感情認識の最適な損失関数の組み合わせは何か？
RQ4GANの識別器を、VA、AUs、基本的顔の表情のマルチタスク分類器として効果的に使用できるか？
RQ5異なる損失関数の組み合わせとハイパーパrameterが、すべてのタスクにおける最終的な性能にどのように影響するか？

主な発見

最高性能を示したモデルは、ValenceのCCCが0.616、ArousalのCCCが0.510、加重F1スコアが0.643、総合正解率が0.645を達成した。
α=β=0.5のマルチタスクモデルは、単一タスクベースライン（VAのみ：CCC=0.579；表情のみ：F1=0.488）を上回り、共同学習の利点を示した。
VAにCCCベースの損失、表情分類に交差エントロピーを適用し、学習率を10−3に設定した場合、すべての指標で最高の性能を達成した。
GANの識別器は、VAの同時回帰とAUsの分類を実行した際、総合正解率0.667を達成し、単一タスク設定を上回った。
生成器は、ポーズの変化、照明、遮蔽などの実世界の特徴を効果的に学習し、訓練データを豊かにする現実的な画像を生成した。
α=β=0.5のモデルは、表情分類においてF1スコアが6.7%向上し、総合正解率が10.5%向上した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。