Skip to main content
QUICK REVIEW

[论文解读] The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context

Xiaoyuan Liu, Tian Liang|arXiv (Cornell University)|Feb 12, 2026
Multimodal Machine Learning Applications被引用 0
一句话总结

StateLM 引入一种带有自学习自.context 工具箱的自我上下文工程循环,使模型能够管理自己的上下文并在长文档问答、聊天记忆和深度研究任务中超越基线。

ABSTRACT

In the world of Harry Potter, when Dumbledore's mind is overburdened, he extracts memories into a Pensieve to be revisited later. In the world of AI, while we possess the Pensieve-mature databases and retrieval systems, our models inexplicably lack the "wand" to operate it. They remain like a Dumbledore without agency, passively accepting a manually engineered context as their entire memory. This work finally places the wand in the model's hand. We introduce StateLM, a new class of foundation models endowed with an internal reasoning loop to manage their own state. We equip our model with a suite of memory tools, such as context pruning, document indexing, and note-taking, and train it to actively manage these tools. By learning to dynamically engineering its own context, our model breaks free from the architectural prison of a fixed window. Experiments across various model sizes demonstrate StateLM's effectiveness across diverse scenarios. On long-document QA tasks, StateLMs consistently outperform standard LLMs across all model scales; on the chat memory task, they achieve absolute accuracy improvements of 10% to 20% over standard LLMs. On the deep research task BrowseComp-Plus, the performance gap becomes even more pronounced: StateLM achieves up to 52% accuracy, whereas standard LLM counterparts struggle around 5%. Ultimately, our approach shifts LLMs from passive predictors to state-aware agents where reasoning becomes a stateful and manageable process.

研究动机与目标

  • 推动从无状态的 LLM 向具备状态感知的代理的转变,使它们自主管理记忆和上下文。
  • 提出一套通用的记忆与上下文管理工具包,以实现自我设计的上下文。
  • 在长文档问答、多轮聊天记忆和深度研究任务中展示跨领域收益。
  • 证明学习得到的上下文管理能在不同模型规模上扩展,并超越外部、人工驱动的上下文工程。

提出的方法

  • 引入 StateLM,这是一类具备内部推理循环和 Pensieve 风格记忆工具箱的基础模型。
  • 形式化一个以工具增强的代理性推理过程,其交互历史通过 deleteContext 可删除并通过一个持续的外部笔记本进行记忆。
  • 定义一个六工具“咒语书”(analyzeText, buildIndex, searchEngine, readChunk, note/updateNote, readNote, deleteContext, finish)来管理感知、获取和记忆管理。
  • 在两个阶段对 StateLM 进行训练:先通过专家轨迹的有监督学习(SFT),结合结果导向与过程导向过滤;再进行带轨迹回放和任务感知奖励的强化学习。
  • 在三个领域的长上下文基准上进行评估(长文档问答、聊天记忆、深度研究),使用 4B、8B 和 14B 的模型进行评估。
Figure 1 : StateLM (right) maintains a “sawtooth” context-use profile, rather than monotonic accumulation (left).
Figure 1 : StateLM (right) maintains a “sawtooth” context-use profile, rather than monotonic accumulation (left).

实验结果

研究问题

  • RQ1模型是否能够通过内置记忆工具自主 engineered 自身上下文以克服固定上下文的限制?
  • RQ2学习到的自我上下文工程如何影响在长文档问答、多轮对话和深度研究任务上的表现?
  • RQ3在固定预算下,具备 Pensieve 启发式记忆的状态感知代理是否优于外部、脚本化的上下文工程基线?
  • RQ4 StateLM 如何随模型规模和真实世界长上下文设置的任务难度而扩展?

主要发现

  • StateLM 在长文档问答上优于指令基线,同时仅使用约四分之一的活跃上下文。
  • 在聊天记忆任务中,StateLM 相对于标准 LLM 的绝对准确率提升为 10%–20%。
  • 在 BrowseComp-Plus 深度研究任务中,StateLM 的准确率最高达到 52%,而普通 LLM 约为 5%,平均提升超过 40%。
  • 在基准测试中,StateLM 能在极端上下文长度(如 Needle-in-a-Haystack 设置下可达 2M 记号)时保持鲁棒性。
  • 在经过良好训练的 StateLM 基础上进行强化学习可带来额外改进(例如 StateLM-8B-RL 在某些基准上提升 +3 点)。
  • 工具使用模式显示随着任务规模扩大,搜索次数增加、记忆更新减少,表明上下文管理具备高效、任务自适应特征。
Figure 2 : The self-context engineering workflow of StateLM. Given a query over a long context, StateLM engages in a multi-round, stateful reasoning loop that analyzes the input, builds an index, and iteratively searches, reads, takes notes, and prunes its working context. Messages highlighted in re
Figure 2 : The self-context engineering workflow of StateLM. Given a query over a long context, StateLM engages in a multi-round, stateful reasoning loop that analyzes the input, builds an index, and iteratively searches, reads, takes notes, and prunes its working context. Messages highlighted in re

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。