QUICK REVIEW

[논문 리뷰] Personalized Dialogue Generation with Diversified Traits

Yinhe Zheng, Guanyi Chen|arXiv (Cornell University)|2019. 01. 28.

Topic Modeling참고 문헌 45인용 수 89

한 줄 요약

이 논문은 화자에 대한 명시적 성격 특성을 가진 대규모 데이터셋을 소개하고, 다양한 특성에 조건화된 개인화 응답을 생성하기 위한 trait-aware Seq2Seq 모델(PAA와 PAB)을 제안한다.

ABSTRACT

Endowing a dialogue system with particular personality traits is essential to deliver more human-like conversations. However, due to the challenge of embodying personality via language expression and the lack of large-scale persona-labeled dialogue data, this research problem is still far from well-studied. In this paper, we investigate the problem of incorporating explicit personality traits in dialogue generation to deliver personalized dialogues. To this end, firstly, we construct PersonalDialog, a large-scale multi-turn dialogue dataset containing various traits from a large number of speakers. The dataset consists of 20.83M sessions and 56.25M utterances from 8.47M speakers. Each utterance is associated with a speaker who is marked with traits like Age, Gender, Location, Interest Tags, etc. Several anonymization schemes are designed to protect the privacy of each speaker. This large-scale dataset will facilitate not only the study of personalized dialogue generation, but also other researches on sociolinguistics or social science. Secondly, to study how personality traits can be captured and addressed in dialogue generation, we propose persona-aware dialogue generation models within the sequence to sequence learning framework. Explicit personality traits (structured by key-value pairs) are embedded using a trait fusion module. During the decoding process, two techniques, namely persona-aware attention and persona-aware bias, are devised to capture and address trait-related information. Experiments demonstrate that our model is able to address proper traits in different contexts. Case studies also show interesting results for this challenging research problem.

연구 동기 및 목표

대화 생성에 명시적 성격 특성을 도입하는 작업의 동기 부여 및 정의.
확장 가능한 학습을 위한 다양화된 특성을 가진 대규모의 실제 사회 대화 데이터 세트 제공.
특성 융합을 이용하고 이를 디코딩에 통합하는 페르소나 인식 생성 모델 개발。

제안 방법

성별, 연령, 위치, 관심 태그와 같은 특성을 가진 8.47M명의 화자와 20.83M회의 세션에 걸친 대규모 중국어 대화 말뭉치 PersonalDialog를 구성한다.
각 특성을 임베딩으로 인코딩하고 이를 성격 특성 융합 모듈과 결합시켜 페르소나 표현 v_p를 형성한다.
v_p의 두 가지 디코딩 통합 구현: (i) v_p에 조건화된 주의(attention) 가중치를 사용하는 Persona-Aware Attention (PAA), 및 (ii) 생성 분포에 페르소나 바이어스가 게이팅 메커니즘으로 추가되는 Persona-Aware Bias (PAB).
특성 융합 전략 세 가지 탐색: Traits Attention, Traits Average, Traits Concatenation.
v_p에 조건화된 Bahdanau 스타일의 주의(attention)를 사용하는 Seq2Seq 프레임워크로, 2층 BiGRU 인코더와 2층 GRU 디코더를 사용한다.

실험 결과

연구 질문

RQ1대규모 사회 데이터에서 명시적 성격 특성이 생성된 대화에서 효과적으로 학습되고 표현될 수 있는가?
RQ2다른 특성 융합 방법들이 디코더에 성격 정보를 통합하는 데 어떤 영향을 미치는가?
RQ3어떤 디코딩 전략(PAA vs PAB)이 페르소나 표현을 가장 잘 활용하여 특성 일관된 응답을 생성하는가?
RQ4다양한 특성 융합 방식(주목, 평균, 연결)이 맥락 전반에 걸쳐 다양한 특성의 표현에 어떤 영향을 미치는가?

주요 결과

모델은 다양한 맥락에서 적절하고 다양화된 특성을 다룰 수 있다.
실험에서 Persona-Aware Bias (PAB)가 일반적으로 Persona-Aware Attention (PAA)보다 우수하다.
실제 사회 대화와 다양한 특성을 가진 대규모 데이터셋 PersonalDialog가 개인화된 대화 생성을 위한 학습을 뒷받침한다.
특성 융합을 통해 생성 텍스트에 정확한 특성 값을 포함하지 않아도 명시적 특성 정보를 반영하는 응답을 생성할 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.