QUICK REVIEW

[논문 리뷰] HiGRU: Hierarchical Gated Recurrent Units for Utterance-level Emotion Recognition

Wenxiang Jiao, Haiqin Yang|arXiv (Cornell University)|2019. 04. 09.

Sentiment Analysis and Opinion Mining참고 문헌 29인용 수 70

한 줄 요약

HiGRU는 대화에서의 발화 수준 감정 인식을 위해 단어 수준의 발화 내 특징과 발화 수준의 맥락을 캡처하는 2단계 계층적 GRU 모델을 도입하며, 특징 융합을 수행하고 장거리 맥락을 위해 자기 주의(attention)를 활용하는 두 가지 변형(HiGRU-f와 HiGRU-sf)이 있다.

ABSTRACT

In this paper, we address three challenges in utterance-level emotion recognition in dialogue systems: (1) the same word can deliver different emotions in different contexts; (2) some emotions are rarely seen in general dialogues; (3) long-range contextual information is hard to be effectively captured. We therefore propose a hierarchical Gated Recurrent Unit (HiGRU) framework with a lower-level GRU to model the word-level inputs and an upper-level GRU to capture the contexts of utterance-level embeddings. Moreover, we promote the framework to two variants, HiGRU with individual features fusion (HiGRU-f) and HiGRU with self-attention and features fusion (HiGRU-sf), so that the word/utterance-level individual inputs and the long-range contextual information can be sufficiently utilized. Experiments on three dialogue emotion datasets, IEMOCAP, Friends, and EmotionPush demonstrate that our proposed HiGRU models attain at least 8.7%, 7.5%, 6.0% improvement over the state-of-the-art methods on each dataset, respectively. Particularly, by utilizing only the textual feature in IEMOCAP, our HiGRU models gain at least 3.8% improvement over the state-of-the-art conversational memory network (CMN) with the trimodal features of text, video, and audio.

연구 동기 및 목표

맥락 가변성, 데이터 불균형 및 장거리 의존성에도 불구하고 대화에서의 발화 수준 감정 인식을 견고하게 만들기.
계층적 GRU 구조를 사용하여 단어/발화 수준 정보와 발화 간 맥락을 모두 모델링하기.
특징 융합 및 장거리 맥락 포착을 효과적으로 수행하기 위해 두 가지 변형 HiGRU-f와 HiGRU-sf를 개발하기.
세 가지 대화 감정 데이터셋(IEMOCAP, Friends, EmotionPush)에서 최첨단 모델 대비 개선을 입증하기.

제안 방법

하위 레벨에서 발화 내의 단어 시퀀스를 모델링하여 발화 임베딩을 생성하는 양방향 2단 GRU; 상위 레벨에서 발화의 시퀀스를 모델링하여 맥락 발화 임베딩을 생성한다.
HiGRU-f가 개별 단어/발화 임베딩과 GRU 은닉 상태를 융합하여 맥락 표현을 강화한다.
HiGRU-sf가 GRU 은닉 상태에 자기 주의 층을 추가하여 장거리 전역 맥락을 포착하고 주의 출력과 임베딩 및 은닉 상태를 융합한다.
맥락화된 발화 임베딩은 소프트맥스가 있는 완전 연결 계층으로 전달되어 각 발화의 감정을 예측한다.
학습은 데이터 불균형을 다루기 위해 가중 범주 교차 엔트로피 손실을 사용하며, 클래스 가중치는 클래스 빈도수의 역수(알파로 조정)로 설정한다.

실험 결과

연구 질문

RQ1계층적 GRU가 발화 수준 감정 인식을 위해 미세한 단어 수준 신호와 장거리 발화 수준 맥락을 효과적으로 학습할 수 있는가?
RQ2특징 융합(HiGRU-f)과 특징 융합을 포함한 자기 주의(HiGRU-sf)가 텍스트 대화 데이터에서 일반 HiGRU 및 다른 베이스라인 대비 측정 가능한 이점을 제공하는가?
RQ3감정 간 데이터 불균형 및 데이터셋(IEMOCAP, Friends, EmotionPush)에서 HiGRU 변형의 성능은 어떠한가?

주요 결과

모델	Ang	Hap/Joy	Sad	Neu	WA	UWA
bcLSTM ∗ (T)	75.29	79.40	78.07	76.53	77.7	77.3
bcGRU (T)	77.20	80.99	76.26	72.50	76.9	76.7
HiGRU (T)	75.41	91.64	79.79	70.74	80.6	79.4
HiGRU-f (T)	76.69	88.91	80.25	75.92	81.5	80.4
HiGRU-sf (T)	74.78	89.65	80.50	77.58	82.1	80.6
HiGRU (F+E)	55.41	81.20	51.40	64.40	65.8	63.1
HiGRU-f (F+E)	54.90	78.30	55.50	68.70	68.5	64.3
HiGRU-sf (F+E)	56.80	81.40	52.20	68.70	69.0	64.8

HiGRU 변형들이 세 가지 데이터셋에서 최첨단 방법을 능가한다.
텍스트 특성만 사용한 IEMOCAP에서 HiGRU 변형은 trimodal 특성을 사용하는 CMN 대비 최소 3.8%의 개선을 달성한다.
HiGRU-f와 HiGRU-sf는 기본 HiGRU에 비해 WA(가중 정확도)와 UWA(비가중 정확도) 모두에서 추가 이점을 제공한다.
HiGRU 모델은 감정 간 균형이 잘 맞추어지며 분노와 슬픔과 같은 소수 감정에서도 눈에 띄는 향상을 보인다.
훈련 데이터 셋 혼합이 항상 성능을 향상시키지는 않으며 데이터 셋의 특성이 결과에 영향을 준다.
자기 주의 변형인 HiGRU-sf가 여러 설정에서 제안된 모델들 중 최상의 종합 성능을 달성한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.