[论文解读] Deep Learning Based Chatbot Models
本论文综述近年的深度学习方法在聊天机器人中的应用,分析基于编码-解码器/Transformer的方法,并提出将先验信息如情绪和人设融入以提升开放领域对话生成。
A conversational agent (chatbot) is a piece of software that is able to communicate with humans using natural language. Modeling conversation is an important task in natural language processing and artificial intelligence. While chatbots can be used for various tasks, in general they have to understand users' utterances and provide responses that are relevant to the problem at hand. In my work, I conduct an in-depth survey of recent literature, examining over 70 publications related to chatbots published in the last 3 years. Then, I proceed to make the argument that the very nature of the general conversation domain demands approaches that are different from current state-of-of-the-art architectures. Based on several examples from the literature I show why current chatbot models fail to take into account enough priors when generating responses and how this affects the quality of the conversation. In the case of chatbots, these priors can be outside sources of information that the conversation is conditioned on like the persona or mood of the conversers. In addition to presenting the reasons behind this problem, I propose several ideas on how it could be remedied. The next section focuses on adapting the very recent Transformer model to the chatbot domain, which is currently state-of-the-art in neural machine translation. I first present experiments with the vanilla model, using conversations extracted from the Cornell Movie-Dialog Corpus. Secondly, I augment the model with some of my ideas regarding the issues of encoder-decoder architectures. More specifically, I feed additional features into the model like mood or persona together with the raw conversation data. Finally, I conduct a detailed analysis of how the vanilla model performs on conversational data by comparing it to previous chatbot models and how the additional features affect the quality of the generated responses.
研究动机与目标
- 综述并综合过去3年中关于聊天机器人70余篇出版物。
- 指出开放域对话需要超出标准架构的先验信息。
- 在对话数据集上实验基于 Transformer 的聊天机器人以评估性能。
- 提出将情绪、人物设定等先验信息融入以提高回复质量的思路。
提出的方法
- 回顾包括编码-解码器和 Transformer 模型在内的历史与当代聊天机器人文献。
- 描述 seq2seq 框架中的数据预处理、词嵌入和词汇处理。
- 在 Cornell Movie-Dialog Corpus 和 OpenSubtitles 语料库上进行 Transformer 基础的聊天机器人实验性训练。
- 为编码器-解码器模型添加额外输入,如情绪和人设。
- 将原生 Transformer 的性能与先前的聊天机器人模型进行比较,并分析先验信息的影响。
实验结果
研究问题
- RQ1当前神经聊天机器人架构在开放域对话中的局限性是什么?
- RQ2基于 Transformer 的架构能否有效适应聊天机器人场景?
- RQ3情绪、人物设定等先验信息是否能提升生成回复的质量和相关性?
- RQ4上下文与对话历史如何影响基于 Transformer 的聊天机器人?
- RQ5将知识库与上下文信息融入聊天机器人有哪些有效策略?
主要发现
- 基于 Transformer 的聊天机器人可以在如 Cornell 和 OpenSubtitles 语料库的对话数据集上进行训练。
- 可以整合额外输入(情绪、人物设定)以可能提升回复的相关性和自然度。
- 上下文和对话历史对编码带来挑战,需要分层或记忆聚焦的策略。
- 对话模型的评估仍然复杂,传统指标(BLEU、困惑度)并不总是与人类判 judgments一致。
- 该研究讨论了训练设置以及与先前聊天机器人模型的定性比较。
- 未来工作概述了在对话系统中解决损失函数问题、时序条件和记忆性的方法。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。