QUICK REVIEW

[论文解读] UBAR: Towards Fully End-to-End Task-Oriented Dialog Systems with GPT-2

Yunyi Yang, Yunhao Li|arXiv (Cornell University)|Dec 7, 2020

Topic Modeling参考文献 40被引用 32

一句话总结

UBAR 在完整对话会话上微调 GPT-2（用户、信念状态、DB 结果、系统行为、应答）以构建一个完全端到端的任务导向对话系统，并在 MultiWOZ 的生成、策略优化和端到端建模方面取得最先进的结果。它通过生成的上下文进行评估，以模拟现实使用，并展示在数据有限的新领域中的强迁移能力。

ABSTRACT

This paper presents our task-oriented dialog system UBAR which models task-oriented dialogs on a dialog session level. Specifically, UBAR is acquired by fine-tuning the large pre-trained unidirectional language model GPT-2 on the sequence of the entire dialog session which is composed of user utterance, belief state, database result, system act, and system response of every dialog turn. Additionally, UBAR is evaluated in a more realistic setting, where its dialog context has access to user utterances and all content it generated such as belief states, system acts, and system responses. Experimental results on the MultiWOZ datasets show that UBAR achieves state-of-the-art performances in multiple settings, improving the combined score of response generation, policy optimization, and end-to-end modeling by 4.7, 3.5, and 9.4 points respectively. Thorough analyses demonstrate that the session-level training sequence formulation and the generated dialog context are essential for UBAR to operate as a fully end-to-end task-oriented dialog system in real life. We also examine the transfer ability of UBAR to new domains with limited data and provide visualization and a case study to illustrate the advantages of UBAR in modeling on a dialog session level.

研究动机与目标

将任务导向对话从轮次级建模提升到会话级建模以更好地反映现实世界的使用场景。
提出基于 GPT-2 的模型（UBAR），在包含信念状态和系统行为的完整对话会话上进行训练。
在带有上下文中生成内容的情况下，评估端到端、生成和策略优化设置。
分析在数据有限的新领域上的可迁移性，并通过可视化和案例研究提供洞见。

提出的方法

在连接整个对话会话的序列上微调 DistilGPT-2：每轮的 U、B、D、A、R。
去词汇化应答，使用领域自适应、解耦的信念状态和系统行为片段以提升泛化。
用领域-槽/值和领域-行为/片段令牌来表示信念状态和系统行为以支撑生成。
在会话级序列上以标准语言建模目标进行训练（无额外监督目标）。
在三种设置下进行评估：用于应答生成的真实信念/状态上下文、用于策略优化的真实信念/状态，以及带有生成内容的端到端建模。

实验结果

研究问题

RQ1会话级训练与中间信息（信念状态、系统行为）是否能提升端到端 TOD 的性能？
RQ2用生成的对话上下文来进行评估（而非真实值）是否更能反映真实部署场景？
RQ3模型在数据有限的新领域上的迁移效果如何？
RQ4对话上下文长度和内容（真实 vs 生成）对端到端 TOD 性能有何影响？

主要发现

UBAR 在 MultiWOZ 2.0/2.1 的应答生成、策略优化和端到端建模等方面取得了最先进的结果。
在端到端建模中，UBAR 相较基线在使用全部生成上下文时显著提升了综合评分。
会话级序列训练和使用生成的对话上下文对现实世界的端到端 TOD 性能至关重要。
UBAR 表现出对新领域的有限数据的迁移能力，尤其在少量样本微调时，但数据依赖性仍然明显。
消融研究表明，环境中的信念状态和系统行为比用户话语/回复对学习有效策略和 grounding 更关键。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。