Skip to main content
QUICK REVIEW

[论文解读] UMEM: Unified Memory Extraction and Management Framework for Generalizable Memory

Yongshi Ye, Hui Jiang|arXiv (Cornell University)|Feb 11, 2026
Topic Modeling被引用 0
一句话总结

UMEM 联合优化自进化大语言模型代理的内存提取与管理,利用语义邻域建模和带边际效用奖励的 GRPO 在查询间泛化记忆,提升多轮与具身任务的表现。

ABSTRACT

Self-evolving memory serves as the trainable parameters for Large Language Models (LLMs)-based agents, where extraction (distilling insights from experience) and management (updating the memory bank) must be tightly coordinated. Existing methods predominately optimize memory management while treating memory extraction as a static process, resulting in poor generalization, where agents accumulate instance-specific noise rather than robust memories. To address this, we propose Unified Memory Extraction and Management (UMEM), a self-evolving agent framework that jointly optimizes a Large Language Model to simultaneous extract and manage memories. To mitigate overfitting to specific instances, we introduce Semantic Neighborhood Modeling and optimize the model with a neighborhood-level marginal utility reward via GRPO. This approach ensures memory generalizability by evaluating memory utility across clusters of semantically related queries. Extensive experiments across five benchmarks demonstrate that UMEM significantly outperforms highly competitive baselines, achieving up to a 10.67% improvement in multi-turn interactive tasks. Futhermore, UMEM maintains a monotonic growth curve during continuous evolution. Codes and models will be publicly released.

研究动机与目标

  • 证明自进化代理对可泛化长期记忆的需求并解决内存提取中的实例特定噪声。
  • 提出一个统一框架,联合优化内存提取与管理。
  • 引入语义邻域建模以促进跨任务泛化。
  • 开发边际效用奖励并使用分组相对策略优化(GRPO)进行训练。
  • 在五个基准上展示鲁棒自进化与跨任务提升。

提出的方法

  • 提出一个三组件的 UMEM 架构:一个冻结的代理执行器、一个外部记忆库,以及一个可学习的记忆优化器。
  • 实现语义邻域建模,将语义相关的查询聚类以应对跨任务变异。
  • 在语义邻域上评估的边际效用奖励来指导记忆更新。
  • 使用 GRPO 对记忆优化器进行训练,联合优化提取与管理。
  • 在训练过程中应用在线记忆进化,持续更新记忆库。
Figure 1 : Comparison between the conventional memory pipeline and our proposed UMEM framework. Left : Vanilla methods suffer from the ”Rote Memorization” trap, overfitting to instance-specific noise. Right : UMEM utilizes a learnable Mem-Optimizer to jointly optimize extraction and management. This
Figure 1 : Comparison between the conventional memory pipeline and our proposed UMEM framework. Left : Vanilla methods suffer from the ”Rote Memorization” trap, overfitting to instance-specific noise. Right : UMEM utilizes a learnable Mem-Optimizer to jointly optimize extraction and management. This

实验结果

研究问题

  • RQ1记忆提取与管理的联合优化能否提升跨语义相关任务的记忆泛化?
  • RQ2语义邻域建模是否能降低实例特定噪声并促进跨任务的鲁棒记忆效用?
  • RQ3带边际效用奖励的 GRPO 在使提取记忆与管理策略对齐方面有多有效?
  • RQ4在训练过程中对记忆进行在线进化是否能带来更稳定、可扩展的自进化表现?

主要发现

  • UMEM 在五个基准上在单轮推理和多轮具身任务中均优于 ReMem 和 Memp 等基线。
  • 内存提取与管理的联合优化比单独优化任一部分更有效,验证了耦合设计。
  • 通过语义邻域建模与边际效用奖励(通过 GRPO)提升对语义相关查询的泛化。
  • UMEM 在持续自进化的过程中显示出单调增长与鲁棒性。
  • 更强的执行器(如 GPT-5.1、Gemini-2.5-Flash)放大 UMEM 的收益,策略模型规模扩大(高达4B)进一步提升。
  • 测试时的自进化展示出持续的性能提升和高效推理,且步骤更少。
Figure 2 : Overview of UMEM. Left : Semantic Neighborhood Modeling retrieves related queries to simulate cross-task variations. Right : The Mem-Optimizer distills trajectories from the frozen Executor into memory updates, which are optimized via GRPO. The process is guided by a Marginal Utility Rewa
Figure 2 : Overview of UMEM. Left : Semantic Neighborhood Modeling retrieves related queries to simulate cross-task variations. Right : The Mem-Optimizer distills trajectories from the frozen Executor into memory updates, which are optimized via GRPO. The process is guided by a Marginal Utility Rewa

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。