QUICK REVIEW

[论文解读] Beyond Goldfish Memory: Long-Term Open-Domain Conversation

Jing Xu, Arthur Szlam|arXiv (Cornell University)|Jul 15, 2021

Topic Modeling被引用 40

一句话总结

本文介绍 Multi-Session Chat (MSC)，一个长期开放域对话数据集，并显示检索增强与记忆基础的摘要模型在长上下文对话中优于标准的编码-解码Transformer，自动与人类评估都证实了改进。

ABSTRACT

Despite recent improvements in open-domain dialogue models, state of the art models are trained and evaluated on short conversations with little context. In contrast, the long-term conversation setting has hardly been studied. In this work we collect and release a human-human dataset consisting of multiple chat sessions whereby the speaking partners learn about each other's interests and discuss the things they have learnt from past sessions. We show how existing models trained on existing datasets perform poorly in this long-term conversation setting in both automatic and human evaluations, and we study long-context models that can perform much better. In particular, we find retrieval-augmented methods and methods with an ability to summarize and recall previous conversations outperform the standard encoder-decoder architectures currently considered state of the art.

研究动机与目标

推动对超越短会话数据集的长期开放域对话的研究。
提供一个公开数据集（MSC），捕捉多轮会话及对过去交互的摘要。
评估长上下文建模方法，识别在记忆与检索方面的有效架构。
提供见解和基线，推动长期对话研究的进展。

提出的方法

介绍 Multi-Session Chat (MSC) 数据集，每个对话包含5个会话，参与者相互了解并引用过去的会话。
将具有更长上下文窗口的编码-解码 Transformer 与检索增强方法（RAG、FiD、FiD-RAG）以及一个在生成时进行摘要的读/写记忆模型进行比较。
提出一个记忆增强生成框架（SumMem-MSC），将摘要知识存储在长期记忆中并在生成时检索。
在 MSC 上训练和评估模型，包括具有不同上下文长度的变体以及金标准或预测摘要的情形。
进行人类评估以评估参与度和对先前会话的参照锚定。

实验结果

研究问题

RQ1是否可以用引用过去交互的多会话数据有效建模长上下文的开放域对话？
RQ2在长期上下文下，检索增强和记忆增强模型是否优于标准的编码-解码 Transformer？
RQ3使用会话摘要与原始对话历史对生成质量和参与度有何影响？
RQ4上下文长度和摘要可用性如何影响长期对话中的模型表现？

主要发现

在 MSC 上，检索增强方法和基于摘要的记忆模型在自动评估指标和人类判断上均优于标准的编码-解码基线。
SumMem-MSC 2.7B (FiD-RAG) 在人类参与度和最终评分方面表现出色，例如参与度高的回答达到 62.1%，最终评分为 3.65。
在人工评估中，MSC 训练的模型在参与度和最终分数上超过 BlenderBot（BST 2.7B）。
以会话级摘要作为上下文比对话历史有改进，其中金标准摘要提供更强的提升，但与预测摘要相当。
较长的上下文长度（512–1024 tokens）在使用过去的会话或摘要时通常会提高困惑度和参与度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。