QUICK REVIEW

[論文レビュー] End-to-end Multimodal Emotion and Gender Recognition with Dynamic Weights of Joint Loss

Myungsu Chae, Taeho Kim|arXiv (Cornell University)|Sep 4, 2018

Emotion and Mood Recognition被引用数 3

ひとこと要約

本稿では、音声および映像データを用いたエンドツーエンドのマルチモーダル感情認識および性別認識における結合損失のための動的重み付け戦略を提案し、トレーニング中にタスク固有の損失を適応的にバランスさせることで、全体的なモデル性能を向上させている。この手法は、静的重み付け手法と比較して、より低い結合損失とより優れた一般化性能を達成している。

ABSTRACT

Multi-task learning is a method for improving the generalizability of multiple tasks. In order to perform multiple classification tasks with one neural network model, the losses of each task should be combined. Previous studies have mostly focused on multiple prediction tasks using joint loss with static weights for training models, choosing the weights between tasks without making sufficient considerations by setting them uniformly or empirically. In this study, we propose a method to calculate joint loss using dynamic weights to improve the total performance, instead of the individual performance, of tasks. We apply this method to design an end-to-end multimodal emotion and gender recognition model using audio and video data. This approach provides proper weights for the loss of each task when the training process ends. In our experiments, emotion and gender recognition with the proposed method yielded a lower joint loss, which is computed as the negative log-likelihood, than using static weights for joint loss. Moreover, our proposed model has better generalizability than other models. To the best of our knowledge, this research is the first to demonstrate the strength of using dynamic weights for joint loss for maximizing overall performance in emotion and gender recognition tasks.

研究の動機と目的

感情および性別認識のためのマルチタスク学習における静的損失重み付けの限界を解決すること。
トレーニング中に損失重みを動的に調整することで、全体的なモデル性能を向上させること。
マルチモーダル（音声および映像）の感情および性別認識タスクにおける一般化性能を向上させること。
共同学習の場面において、動的損失重み付けの有効性を示すこと。

提案手法

本手法は、音声および映像入力を用いて感情および性別を同時に予測するニューラルネットワークアーキテクチャを採用している。
トレーニングの進行に応じて各タスクの損失の寄与度を調整する動的損失重み付けメカニズムを用いている。
結合損失は、個々のタスク損失の重み付き和として計算され、重みはタスク最適化のバランスを取るために動的に更新される。
動的重みは、結合予測の負の対数尤度として定義される総結合損失を最小化するように導出されている。
モデルはマルチモーダルデータ上でエンドツーエンドにトレーニングされ、特徴学習と損失最適化を統合されたフレームワークで行っている。

実験結果

リサーチクエスチョン

RQ1動的損失重み付けは、マルチモーダルな感情および性別認識モデルの全体的な性能を向上させることができるか？
RQ2結合損失および一般化性能の観点から、動的損失重み付けは静的重み付けと比べてどのように異なるか？
RQ3提案手法は、感情分類および性別分類の両タスクにおいて、モデルのロバスト性および性能を向上させるか？

主な発見

提案手法は、静的損失重み付けを用いたモデルと比較して、より低い結合損失を達成した。
ベースラインモデルと比較して、提案手法は静的重み付けを用いたモデルよりも優れた一般化性能を示した。
動的損失重み付けはタスク最適化を効果的にバランスさせ、個々のタスク性能ではなく総合的な性能を向上させた。
著者らの知る限り、本研究は、感情および性別認識における共同最適化に動的損失重み付けを適用した最初の研究である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。