QUICK REVIEW

[论文解读] Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval for Long-Term Memory

Sahil Sen, Elias Lumer|arXiv (Cornell University)|Mar 17, 2026

Topic Modeling被引用 0

一句话总结

Chronos 引入一个双日历记忆系统，选择性地提取具有时间性的事件，并使用动态提示实现高效、时间感知的长期记忆检索，适用于基于LLM的对话代理，在 LongMemEvalS 上达到最先进的准确率。

ABSTRACT

Recent advances in Large Language Models (LLMs) have enabled conversational AI agents to engage in extended multi-turn interactions spanning weeks or months. However, existing memory systems struggle to reason over temporally grounded facts and preferences that evolve across months of interaction and lack effective retrieval strategies for multi-hop, time-sensitive queries over long dialogue histories. We introduce Chronos, a novel temporal-aware memory framework that decomposes raw dialogue into subject-verb-object event tuples with resolved datetime ranges and entity aliases, indexing them in a structured event calendar alongside a turn calendar that preserves full conversational context. At query time, Chronos applies dynamic prompting to generate tailored retrieval guidance for each question, directing the agent on what to retrieve, how to filter across time ranges, and how to approach multi-hop reasoning through an iterative tool-calling loop over both calendars. We evaluate Chronos with 8 LLMs, both open-source and closed-source, on the LongMemEvalS benchmark comprising 500 questions spanning six categories of dialogue history tasks. Chronos Low achieves 92.60% and Chronos High scores 95.60% accuracy, setting a new state of the art with an improvement of 7.67% over the best prior system. Ablation results reveal the events calendar accounts for a 58.9% gain on the baseline while all other components yield improvements between 15.5% and 22.3%. Notably, Chronos Low alone surpasses prior approaches evaluated under their strongest model configurations.

研究动机与目标

解决在对话AI中跨周/月交互的时序性长期记忆挑战。
提出一个记忆框架，选择性提取时间事件并保留原始对话以供语义检索。
开发动态提示以为每个问题检索提供定制化引导，并实现对两个日历的迭代工具调用。

提出的方法

将带时间戳的事件提取为主-谓-宾三元组，并解析日期时间范围。
维护两个日历：事件日历用于结构化时序事件， turns 日历用于原始对话。
在 turns 日历上使用密集检索、再排序以及三阶段初始检索（向量检索、再排序、上下文扩展）。
应用动态提示为记忆查询生成按问题的检索引导。
实现一个 Chronos Agent 具备工具调用能力，以对两个日历进行迭代式检索。

Figure 1: An overview of the Chronos Architecture. Event Extraction, Dual Indexing, and Query Processing result in a generated answer.

实验结果

研究问题

RQ1与纯对话轮次级或完全结构化知识库方法相比，条件查询的选择性提取时序事件如何提升长期记忆检索？
RQ2动态提示是否能将检索策略定制化以适应不同的长期记忆查询类型（时序推理、知识更新、多轮会话聚合）？
RQ3双日历记忆与面向事件的索引是否能在大规模场景中实现跨会话的准确时序推理？

主要发现

Method	Overall	KU	MS	SSA	SSP	SSU	TR
Chronos Low (Ours)	92.60	96.15	91.73	100.00	80.00	94.29	90.23
Honcho †	90.40	94.87	84.96	96.43	90.00	94.29	88.72
EmergenceMem Internal	86.00	83.33	81.20	100.00	60.00	98.57	85.71
Mastra	84.80	85.90	79.70	82.14	73.33	98.57	85.71
Supermemory	81.60	88.50	71.40	96.40	70.00	97.10	76.70
Hindsight ‡	83.60	84.60	79.70	94.60	66.70	95.70	79.70
Zep	71.20	83.30	57.90	80.40	56.70	92.90	62.40
Full-context	60.20	78.20	44.30	94.60	20.00	81.40	45.10

Chronos Low 在 LongMemEvalS 上达到 92.60% 的准确率，成为采用 GPT-4o 的实际方法中的新一代最先进方法。
Chronos High 在 LongMemEvalS 上达到 95.60% 的准确率，是在更强模型下该基准的最高记录。
事件日历对基线在消融实验中贡献了 58.9% 的提升，其他组件贡献了 15.5–22.3% 的提升。
Chronos 在知识更新跟踪和多轮会话聚合方面优于基线，在若干单轮会话类别下达到完美准确率。
动态提示为每个问题提供检索引导，提升在模型能力较低时（Chronos Low）尤其明显的性能提升。
消融结果显示去除事件索引几乎将 Chronos Low 的准确率减半，突出时序结构的价值。

Figure 2: Overall Benchmark Accuracy on both High and Low Configurations. Note: High configurations refer to evaluations with advanced frontier models, such as Opus 4.6 and Gemini 3 Pro. Standard configurations refer to the traditional evaluated model, GPT-4o, or similar models.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。