QUICK REVIEW

[論文レビュー] Variational Transformers for Diverse Response Generation

Zhaojiang Lin, Genta Indra Winata|arXiv (Cornell University)|Mar 28, 2020

Speech Recognition and Synthesis参考文献 27被引用数 46

ひとこと要約

この論文は、Transformerの効率性とCVAEスタイルの潜在変数を組み合わせて、多様で一貫した対話応答を実現するVariational Transformer (VT) モデル—Global Variational Transformer (GVT) と Sequential Variational Transformer (SVT)—を提案し、自動指標と人間の評価の両方でベースラインを上回ることを示します。

ABSTRACT

Despite the great promise of Transformers in many sequence modeling tasks (e.g., machine translation), their deterministic nature hinders them from generalizing to high entropy tasks such as dialogue response generation. Previous work proposes to capture the variability of dialogue responses with a recurrent neural network (RNN)-based conditional variational autoencoder (CVAE). However, the autoregressive computation of the RNN limits the training efficiency. Therefore, we propose the Variational Transformer (VT), a variational self-attentive feed-forward sequence model. The VT combines the parallelizability and global receptive field of the Transformer with the variational nature of the CVAE by incorporating stochastic latent variables into Transformers. We explore two types of the VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of fine-grained latent variables. Then, the proposed models are evaluated on three conversational datasets with both automatic metric and human evaluation. The experimental results show that our models improve standard Transformers and other baselines in terms of diversity, semantic relevance, and human judgment.

研究の動機と目的

Deterministic Transformerベースの対話生成の鈍さと汎用性の欠如を解消する。
Transformerに確率的な潜在変数を組み込み、多様で文脈適切な応答を捉える。
対話モデリングにおけるグローバル（談話レベル）と逐次的な潜在変数設計を比較する。
自動指標と人間の評価を用いて、複数の対話データセットで評価する。

提案手法

グローバル潜在変数をデコーダ入力に追加したGlobal Variational Transformer (GVT) と、デコーディング位置ごとに潜在変数の列を用意するSequential Variational Transformer (SVT) の2つのVTバリアントを導入する。
SVTで非因果的アテンションを活用し、Transformerフレームワーク内でCVAEに着想を得た事前・事後潜在変数モデリングを使用する。
KLアニーリングとボキャブラリゼロ市方の補助損失を組み込み、潜在変数の消失を抑制し情報量豊かな潜在表現を促進する。
ELBO目的関数にSBOW補助損失を加えて、位置ごとに将来の生成を計画する潜在変数を促進する。
300隠れユニット、4層のTransformerベース、4ヘッド、300次元潜在変数を用い、MLE事前学習を再利用しAdam最適化を適用する。

実験結果

リサーチクエスチョン

RQ1Transformerベースの対話モデルに潜在変数を組み込むことで、意味的関連性を損なうことなく応答の多様性を向上させられるか。
RQ2グローバル（談話レベル）と逐次的（トークンごと）潜在変数は生成品質と人間の評価にどう影響するか。
RQ3KLアニーリングと補助損失はVTモデルの学習を安定させ、有用な潜在情報を保持するか。
RQ4GVTとSVTが異なるデータセット全体で自動指標と人間評価に与える比較効果は。

主な発見

モデル	PPL	KLD	多様性	埋め込み類似度 (EMB_FT)	埋め込み類似度 (EMB_BERT)	コヒーレンス	感情/エンゲージメント	Dist-1	Dist-2	Dist-3
Seq2Seq	130.75	-	0.0055	0.0187	0.0347	0.738	0.594	20.67	20.67	-
CVAE	35.33	27.55	0.0189	0.1340	0.3640	0.751	0.613	18.33	18	-
Transformer	72.66	-	0.0040	0.0161	0.0324	0.741	0.596	19.67	23.33	-
GVT	19.71	18.15	0.0207	0.1524	0.4064	0.753	0.609	23	22.67	-
SVT	18.96	32.27	0.0079	0.1053	0.3654	0.762	0.619	26	27.67	-
Human	-	-	-	-	-	-	-	-	-	-
CVAE	31.32	10.01	0.0186	0.1102	0.295	0.917	0.666	20.67	21.33	-
Transformer	48.03	-	0.0058	0.0237	0.0524	0.915	0.672	24.67	24.67	-
GVT	18.34	19.13	0.0204	0.1406	0.3995	0.917	0.675	20	21.33	-
SVT	17.75	24.67	0.0213	0.1521	0.3936	0.906	0.665	38.67	36.67	-

GVTとSVTは多様性と人間の評価において標準のTransformerおよびCVAEベースラインを上回る。
SVTはMojiTalkでEmbedding Similarity (EMB_FT) および Embedding Similarity (EMB_BERT) による意味的一貫性が高く、Persona+EDではニュアンスのある結果を示す。
GVTは再構成 perplexity（PPL）を一般的に低下させ、潜在情報が豊かであることを示唆；SVTは逐次潜在変数によりPPLをさらに改善。
GVTとSVTはベースラインよりDist-1/Dist-2/Dist-3を改善し、出力の多様性が高い。
人間の評価はSVTが一貫性・感情・エンゲージメントで優れており、特定データセットではSVTのトークンごとの潜在変数モデル化が情報量を高める。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。