QUICK REVIEW

[論文レビュー] Decoupled Multimodal Distilling for Emotion Recognition

Yong Li, Yuanzhi Wang|arXiv (Cornell University)|Mar 24, 2023

Emotion and Mood Recognition被引用数 9

ひとこと要約

DMD は multimodal features を modality-irrelevant space と exclusive space にデカップリングし、 dual graph distillation units を用いて言語・視覚・音声間の知識 transfer を適応的に行い、感情認識において MOSI および MOSEI で最先端の結果を達成します。

ABSTRACT

Human multimodal emotion recognition (MER) aims to perceive human emotions via language, visual and acoustic modalities. Despite the impressive performance of previous MER approaches, the inherent multimodal heterogeneities still haunt and the contribution of different modalities varies significantly. In this work, we mitigate this issue by proposing a decoupled multimodal distillation (DMD) approach that facilitates flexible and adaptive crossmodal knowledge distillation, aiming to enhance the discriminative features of each modality. Specially, the representation of each modality is decoupled into two parts, i.e., modality-irrelevant/-exclusive spaces, in a self-regression manner. DMD utilizes a graph distillation unit (GD-Unit) for each decoupled part so that each GD can be performed in a more specialized and effective manner. A GD-Unit consists of a dynamic graph where each vertice represents a modality and each edge indicates a dynamic knowledge distillation. Such GD paradigm provides a flexible knowledge transfer manner where the distillation weights can be automatically learned, thus enabling diverse crossmodal knowledge transfer patterns. Experimental results show DMD consistently obtains superior performance than state-of-the-art MER methods. Visualization results show the graph edges in DMD exhibit meaningful distributional patterns w.r.t. the modality-irrelevant/-exclusive feature spaces. Codes are released at \url{https://github.com/mdswyz/DMD}.

研究の動機と目的

強いモダリティ不均質性の下で堅牢なマルチモーダル感情認識（MER）を動機づける。
モダリティを共通（irrelevant）成分と私的（exclusive）成分に分割するデカップルド特徴フレームワークを提案する。
適応的な跨モーダル知識転送を可能にする二つのグラフ蒸留ユニット（HomoGD と HeteroGD）を開発する。
マージンと直交性制約を用いた自己回帰ベースのデカップリングを活用し、分離を強化する。
公開データセットで優れた MER 性能を示し、学習された跨モーダル相互作用の可視化を提供する。

提案手法

共有エンコーダとモダリティ特異的エンコーダを用いて各モダリティを共通空間（完全または大部分がモダリティに依存しない）と私的空間（モダリティ専有）へデカップリングする。
結合特徴を再構成する自己回帰を用い、再構成損失とサイクル損失でデカップリングを課す。
感情およびモダリティ間の同質特徴の意味ある分離を促すマージン損失を適用する。
各デカップルド空間内で適応的な跨モーダル知識蒸留を実行するグラフ蒸留ユニット（GD-Unit）を導入する。
HomoGD は学習された蒸留グラフを用いて同質特徴間の知識を蒸留し、動的ウェイトを持つ。
HeteroGD は蒸留前に異質特徴を整列させるマルチモーダルトランスフォーマを用いて分布ギャップを橋渡しする。
refined な同質特徴と強化された異質特徴を融合して最終的な MER 予測を行う。
目的関数はタスク損失、デカップリング損失、蒸留損失を組み合わせたもの: L_total = L_task + λ1 L_dec + λ2 L_dtl。

Figure 1 : (a) illustrates the significant emotion recognition discrepancies using unimodality, adapted from Mult [ 28 ] . (b) shows the conventional cross-modal distillation. (c) shows our proposed decoupled multimodal distillation (DMD) method. DMD consists of two graph distillation (GD) units: ho

実験結果

リサーチクエスチョン

RQ1モダリティ不関連空間とモダリティ専有空間へモーダル表現をデカップリングすることは、跨モーダル不均一性の下で MER 性能を改善するか。
RQ2動的に学習される蒸留グラフ（HomoGD/HeteroGD）は、固定的または素朴な融合戦略よりも跨モーダル知識転送を改善するか。
RQ3デカップルド特徴はモダリティと感情をまたいでどのように進化し、学習されたグラフのエッジはどんなパターンを示すか。
RQ4グラフ蒸留と跨モーダル注意は uni-modal と multi-modal 融合の MER 効果にどのような影響を与えるか。

主な発見

Model	Setting	ACC7 (%)	ACC2 (%)	F1 (%)
EF-LSTM	Aligned	33.7	75.3	75.2
LF-LSTM	Aligned	35.3	76.8	76.7
TFN	Aligned	32.1	73.9	73.4
LMF	Aligned	32.8	76.4	75.7
MFM	Aligned	36.2	78.1	78.1
RAVEN	Aligned	33.2	78.0	76.6
MCTN	Aligned	35.6	79.3	79.1
MulT	Aligned	40.0	83.0	82.8
PMR	Aligned	40.6	83.6	83.4
MISA	Aligned	42.3	83.4	83.6
FDMER	Aligned	44.1	84.6	84.7
DMD (Ours)	Aligned	45.6	86.0	86.0
EF-LSTM	Unaligned	31.0	73.6	74.5
LF-LSTM	Unaligned	33.7	77.6	77.8
RAVEN	Unaligned	31.7	72.7	73.1
MCTN	Unaligned	32.7	75.9	76.4
MulT	Unaligned	39.1	81.1	81.0
PMR	Unaligned	40.6	82.4	82.1
MICA	Unaligned	40.8	82.6	82.7
DMD (Ours)	Unaligned	41.9	83.5	83.5

DMD は整列済み・非整列済みの設定の両方で、CMU-MOSI および CMU-MOSEI において最先端手法と比べて高いまたは競合的な精度を達成する。
同質空間と異質空間へのデカップリングとグラフ蒸留の組み合わせは、ベースラインよりMER性能を一貫して向上させる。
HomoGD は主に言語を強い寄与要素として活用し、L→A および L→V のエッジが多くのケースで支配的であり、モダリティの寄与を反映している。
HeteroGD は跨モーダル注意により分布ギャップを埋め、意味のあるモーダリティ間相互作用（例: V→A の利得）を明らかにする。
アブレーション研究では、特徴デカップリング（FD）と2つの GD ユニットの組み合わせが最良の結果をもたらし、CA または1つの GD ユニットを削除すると性能が劣化する。
可視化は、デカップルド空間がクラス分離性を高め、同質空間は感情で、異質空間はモダリティでクラスタリングすることを示す。

Figure 2 : The framework of DMD. Given the input multimodal data, DMD encodes their respective shallow features $\widetilde{\mathbf{X}}_{m}$ , where $m\in\{L,V,A\}$ . In feature decoupling , DMD exploits the decoupled homo-/heterogeneous multimodal features $\mathbf{X}^{\text{com}}_{m}$ / $\mathbf{X

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。