QUICK REVIEW

[논문 리뷰] Complementary Fusion of Multi-Features and Multi-Modalities in Sentiment Analysis

Feiyang Chen, Ziqian Luo|arXiv (Cornell University)|2019. 04. 17.

Sentiment Analysis and Opinion Mining참고 문헌 35인용 수 53

한 줄 요약

DFF-ATMF를 소개하는 이 논문은 다중 특징 음향 융합과 다중 모달 주의를 결합한 이중 분기 음성-텍스트 다중모달 모델로서 CMU-MOSI, CMU-MOSEI, IEMOCAP 데이터셋에서 감정 분석을 개선합니다. 경쟁력 있거나 최첨단 결과를 달성하며 특징의 보완성 및 강건성을 보여줍니다.

ABSTRACT

Sentiment analysis, mostly based on text, has been rapidly developing in the last decade and has attracted widespread attention in both academia and industry. However, the information in the real world usually comes from multiple modalities, such as audio and text. Therefore, in this paper, based on audio and text, we consider the task of multimodal sentiment analysis and propose a novel fusion strategy including both multi-feature fusion and multi-modality fusion to improve the accuracy of audio-text sentiment analysis. We call it the DFF-ATMF (Deep Feature Fusion - Audio and Text Modality Fusion) model, which consists of two parallel branches, the audio modality based branch and the text modality based branch. Its core mechanisms are the fusion of multiple feature vectors and multiple modality attention. Experiments on the CMU-MOSI dataset and the recently released CMU-MOSEI dataset, both collected from YouTube for sentiment analysis, show the very competitive results of our DFF-ATMF model. Furthermore, by virtue of attention weight distribution heatmaps, we also demonstrate the deep features learned by using DFF-ATMF are complementary to each other and robust. Surprisingly, DFF-ATMF also achieves new state-of-the-art results on the IEMOCAP dataset, indicating that the proposed fusion strategy also has a good generalization ability for multimodal emotion recognition.

연구 동기 및 목표

텍스트 외에 오디오를 활용하여 다중 모달 감정 분석의 가능성을 확장한다.
다중 특징 융합과 다중 모달 융합을 결합하는 융합 전략을 제안한다.
CMU-MOSI, CMU-MOSEI, IEMOCAP 데이터셋에서 DFF-ATMF 모델을 개발하고 평가한다.

제안 방법

Bi-LSTM 기반 특징 추출을 사용하는 오디오와 텍스트 모달리티의 두 평행 분기.
각 모달리티 내에서 다중 특징 융합(원시 파형과 음향 특징 결합).
TSV(Text Sentiment Vector)를 위한 BERT 임베딩과 그 뒤를 잇는 Bi-LSTM 및 어텐션을 사용한 텍스트 표현.
최종 예측을 위한 ASV와 TSV를 그들의 다중 특징 벡터와 함께 결합하는 다중모달 어텐션 융합.
교차 엔트로피 손실, 드롭아웃, Adam 옵티마이저로 학습; 가중치 정확도와 Macro F1로 평가한다.

실험 결과

연구 질문

RQ1다중 특징 융합과 다중 모달 주의 융합을 결합하는 융합 전략이 단일 특징 또는 단일 모달 기준선보다 다중 모달 감정 분석을 개선하는가?
RQ2제안된 DFF-ATMF 모델에서 학습된 특징이 데이터셋과 작업(감정 인식 및 감정 인식) 전반에 걸쳐 보완성과 강건성을 보이는가?

주요 결과

모델	CMU-MOSI Acc	CMU-MOSI F1	CMU-MOSEI Acc	CMU-MOSEI F1	IEMOCAP Overall Acc	IEMOCAP Macro F1
( ? )	79.30	80.12	-	-	75.60	76.31
( ? )	80.10	80.62	-	-	-	-
( ? )	74.93	75.42	76.24	77.03	-	-
( ? )	76.60	76.93	-	-	78.20	78.79
( ? )	80.58	80.96	79.74	80.15	-	-
DFF-ATMF	80.98	81.26	77.15	78.33	81.37	82.29

DFF-ATMF는 제안된 설정에서 CMU-MOSI에서 80.98% 정확도와 81.26% F1을 달성한다.
DFF-ATMF는 CMU-MOSEI에서 77.15% 정확도와 78.33% F1에 도달한다.
IEMOCAP에서 DFF-ATMF는 81.37% overall accuracy와 82.29% Macro F1을 달성한다.
DFF-ATMF는 CMU-MOSI와 IEMOCAP에서 다수의 최첨단 모델을 능가하고 CMU-MOSEI에서도 경쟁력을 보인다.
어텐션 히트맵은 학습된 특징이 데이터셋 전반에 걸쳐 보완적이고 강건함을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.