QUICK REVIEW

[论文解读] Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors

Yansen Wang, Ying Shen|arXiv (Cornell University)|Nov 23, 2018

Sentiment Analysis and Opinion Mining参考文献 34被引用 44

一句话总结

Introduce RAVEN，基于细粒度非语言子词序列动态改变词嵌入的模型，在情感与情绪任务的 CMU-MOSI 与 IEMOCAP 数据集上取得具有竞争力的结果。

ABSTRACT

Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations.

研究动机与目标

在多模态语言中说明需要对受非语言上下文条件影响的动态词义进行建模。
提出一个子词级别的非语言建模框架，用以生成多模态移位的词表示。
开发一个端到端的体系结构（RAVEN），将视觉和声学线索与词嵌入融合，以提升情感和情绪任务的预测性能。
对学习到的移位词表示进行可视化和分析，以理解多模态变异模式。

提出的方法

用针对模态特定的 LSTM 对非语言子词序列进行建模，以为每个词生成视觉和声学嵌入。
使用一个门控模态混合网络，从视觉嵌入、声学嵌入和原始词嵌入中计算非语言移位向量，并使用模态特定门控 w_v 和 w_a。
将 V-embedding 与 A-embedding 进行门控融合（再加上一个偏置），以得到非语言移位 h_m，捕捉词义在上下文中的移位。
生成多模态移位词表示 e_m = e + alpha h_m，其中 alpha 用于缩放移位，使其大小与原始嵌入相近。
用词级 LSTM 编码移位后的词表示序列，以生成用于下游任务的句子级多模态表示。
在多模态情感（CMU-MOSI）和情感识别（IEMOCAP）数据集上，对整个架构进行端到端的梯度优化训练。

实验结果

研究问题

RQ1如何在子词粒度上，通过伴随的非语言线索动态地移位词表示？
RQ2与文本单模态或粗糊融合方法相比，非语言子词模式和动态移位是否提升多模态情感分析与情绪识别的性能？
RQ3在不同非语言上下文中，移位后的词表示有哪些典型模式？
RQ4子词级非语言建模和动态移位对于达到最先进的多模态预测是否必要？

主要发现

模型	MAE	相关性	Acc-2
SVM	—	—	—
DF	1.143	0.518	—
BC-LSTM	1.079	0.581	73.9
MV-LSTM	1.019	0.601	73.9
MARN	0.968	0.625	77.1
MFN	0.965	0.632	77.4
RMFN	0.922	0.681	78.4
LMF	0.912	0.668	76.4
RAVEN	0.915	0.691	78.0

RAVEN 在 CMU-MOSI 情感任务上取得的结果具有竞争力（MAE 0.915，Corr 0.691，Acc-2 78.0），相比若干基线。
在 IEMOCAP 上，RAVEN 在各情感方面表现出色的准确率和 F1，例如：Happy Acc-2 87.3 和 F1 85.8；Sad Acc-2 83.4 和 F1 83.1；Angry Acc-2 87.3 和 F1 86.7；Neutral Acc-2 69.7 和 F1 69.3。
消融研究表明，移除 Nonverbal Sub-networks 或 Multimodal Shifting 会降低性能，而完整的 RAVEN 优于所有消融变体。
对移位嵌入的可视化揭示了三种可解释的模式：(1) 带极性的词在相反上下文中出现较大移位；(2) 易极化的名词按上下文发生明显移位；(3) 功能词的移位很小，表明存在有意义的、由上下文驱动的变异模式。
结果支持子词非语言建模加上动态词移位能提升多模态预测，超越早期融合基线。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。