QUICK REVIEW

[论文解读] Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

Qingyue Wang, Fu, Yanhe|arXiv (Cornell University)|Aug 29, 2023

Topic Modeling被引用 10

一句话总结

该论文提出使用大语言模型递归生成记忆摘要，以增强长期对话记忆，在 MSC 上使用 ChatGPT 和 text-davinci-003 进行评估，显示在后续会话中一致性有所提升。

ABSTRACT

Recently, large language models (LLMs), such as GPT-4, stand out remarkable conversational abilities, enabling them to engage in dynamic and contextually relevant dialogues across a wide range of topics. However, given a long conversation, these chatbots fail to recall past information and tend to generate inconsistent responses. To address this, we propose to recursively generate summaries/ memory using large language models (LLMs) to enhance long-term memory ability. Specifically, our method first stimulates LLMs to memorize small dialogue contexts and then recursively produce new memory using previous memory and following contexts. Finally, the chatbot can easily generate a highly consistent response with the help of the latest memory. We evaluate our method on both open and closed LLMs, and the experiments on the widely-used public dataset show that our method can generate more consistent responses in a long-context conversation. Also, we show that our strategy could nicely complement both long-context (e.g., 8K and 16K) and retrieval-enhanced LLMs, bringing further long-term dialogue performance. Notably, our method is a potential solution to enable the LLM to model the extremely long context. The code and scripts are released.

研究动机与目标

解决开放领域长期对话中的遗忘问题，无需有标注的数据或额外工具。
提出一种记忆管理方案，将摘要（记忆）从短上下文中递归更新。
使生成器能够使用最新记忆来生成前后连贯的长上下文回复。
在不同大语言模型中证明有效性与鲁棒性，并分析通过少量示例提示带来的潜在收益。

提出的方法

将 LLM 视为记忆管理者和回复生成器。
记忆更新：M_s = LLM(C_{t-1}, M_{s-1}, P_m)，其中 C_{t-1} 是短上下文，P_m 是记忆管理提示。
回复生成：r_t = LLM(C_t, M_s, P_r)，其中 P_r 是回复提示。
通过将先前记忆与新话语结合来递归更新记忆，以产生连贯的长期记忆。
在 MSC 数据集上对固定的 LLMs（ChatGPT、text-davinci-003）进行评估。
与基线进行比较，包括 All Context、Part Context 和 Gold Memory。

实验结果

研究问题

RQ1在没有标注数据或额外工具的情况下，LLMs 能否通过递归总结过去的互动来形成长期对话记忆？
RQ2预测的（递归生成的）记忆是否在长期对话中比使用原始上下文或部分上下文能产生更一致、连贯的回复？
RQ3该方法在不同 LLMs 下是否鲁棒，是否能够从少量示例学习中获益？

主要发现

预测记忆通常能取得最佳性能，尤其是在 MSC 的后期会话（Session4 和 Session5）。
生成的记忆在多个指标上对基线有显著的 F1 和 BLEU-2 提升，且在某些指标上甚至优于 Gold Memory。
记忆预测显示出比使用全部上下文或部分上下文更高的连贯性和对长期信息的整合。
该方法对不同 LLM（如 ChatGPT 和 text-davinci-003）具有鲁棒性。
少量示例提示（一个带标签的示例）进一步提升记忆质量和回复表现。
该方法在记忆上可能出现幻觉（因果关系可能不正确），需要未来工作来缓解。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。