QUICK REVIEW

[论文解读] Recurrent Preference Memory for Efficient Long-Sequence Generative Recommendation

Yixiao Chen, Yuan Wang|arXiv (Cornell University)|Feb 12, 2026

Recommender Systems and Techniques被引用 0

一句话总结

本文提出 Rec2PM，一种基于令牌的偏好记忆框架，将 Lifelong 用户历史压缩为紧凑的记忆令牌，并通过自指教师强制实现并行训练，从而提升推理速度并在与完整序列模型相比仍保持良好准确性。

ABSTRACT

Generative recommendation (GenRec) models typically model user behavior via full attention, but scaling to lifelong sequences is hindered by prohibitive computational costs and noise accumulation from stochastic interactions. To address these challenges, we introduce Rec2PM, a framework that compresses long user interaction histories into compact Preference Memory tokens. Unlike traditional recurrent methods that suffer from serial training, Rec2PM employs a novel self-referential teacher-forcing strategy: it leverages a global view of the history to generate reference memories, which serve as supervision targets for parallelized recurrent updates. This allows for fully parallel training while maintaining the capability for iterative updates during inference. Additionally, by representing memory as token embeddings rather than extensive KV caches, Rec2PM achieves extreme storage efficiency. Experiments on large-scale benchmarks show that Rec2PM significantly reduces inference latency and memory footprint while achieving superior accuracy compared to full-sequence models. Analysis reveals that the Preference Memory functions as a denoising Information Bottleneck, effectively filtering interaction noise to capture robust long-term interests.

研究动机与目标

解决全注意力 GenRec 在长期用户历史上的可扩展性和噪声问题。
提出一种记忆增强框架，将长历史压缩为偏好记忆令牌。
实现对记忆更新的并行训练，同时允许迭代推理更新。
展示在存储与时延方面高效的流式更新，并具备具有竞争力的准确性。

提出的方法

将偏好记忆表示为每个用户的一小组可学习令牌嵌入（记忆槽）。
使用记忆编码器通过全局学习的记忆查询 Q_mem 将历史上下文压缩为原子记忆状态 m。
两种记忆更新模式：覆盖（固定大小记忆）与追加（可扩展记忆）。
引入一个两阶段的并行训练方案：(i) 从原始历史生成全局参考记忆；(ii) 通过与参考记忆对齐的一致性损失（L_con）对局部更新进行并行监督。
统一架构共享记忆编码器与生成解码器，在单次前向传播中处理 M_{k-1} 与当前片段 S_k。
训练目标结合自回归损失 L_AR（用于下一个项的预测）与记忆一致性损失 L_con：L = L_AR + lambda * L_con。

实验结果

研究问题

RQ1如何在不牺牲预测精度的前提下将 Lifelong 用户历史压缩为紧凑的记忆令牌？
RQ2能否将记忆更新在并行模式下进行训练，而不是通过时间步串行反向传播？
RQ3自指教师强制的目标是否在保持循环记忆更新有效性的同时稳定训练？
RQ4与全序列注意力和 KV-cache 基于内存相比，Rec2PM在时延、存储和准确性方面的表现如何？

主要发现

Method	SASRec H@1	SASRec H@10	SASRec H@50	SASRec N@10	SASRec N@50	HSTU H@1	HSTU H@10	HSTU H@50	HSTU N@10	HSTU N@50
Short/SASRec	14.10	40.96	57.59	26.68	30.39	13.94	41.67	59.08	26.86	28.88
Short/HSTU	13.94	41.67	59.08	26.86	28.88	14.24	42.77	60.37	27.47	31.41
Short/Tok-Serial-O	14.57	42.56	59.66	27.62	31.46	14.65	43.75	61.03	28.20	32.07
Short/Tok-Serial-A	14.49	42.56	59.70	27.58	31.43	14.45	43.60	61.02	28.01	31.91
Short/KV-Mask-O	14.73	42.32	59.31	27.60	31.41	14.56	43.64	61.07	28.08	32.00
Short/KV-Mask-A	14.72	42.35	59.37	27.56	31.37	14.64	43.59	60.88	28.10	31.97
Short/Rec2PM-O	14.79	43.12	59.92	28.05	31.82	15.04	44.20	61.23	28.66	32.48
Short/Rec2PM-A	14.73	42.76	59.74	27.81	31.62	14.87	44.13	61.16	28.50	32.31
Full/SASRec	14.43	42.40	59.31	27.44	31.23	14.24	42.77	60.37	27.47	31.41
Full/HSTU	14.24	42.77	60.37	27.47	31.41	14.24	42.77	60.37	27.47	31.41
Full/Tok-Serial-O	14.57	42.56	59.66	27.62	31.46	14.65	43.75	61.03	28.20	32.07
Full/Tok-Serial-A	14.45	43.60	61.02	28.01	31.91	14.45	43.60	61.02	28.01	31.91
Full/KV-Mask-O	14.56	43.64	61.07	28.08	32.00	14.56	43.64	61.07	28.08	32.00
Full/KV-Mask-A	14.64	43.59	60.88	28.10	31.97	14.64	43.59	60.88	28.10	31.97
Full/Rec2PM-O	15.04	44.20	61.23	28.66	32.48	14.24	42.77	60.37	27.47	31.41
Full/Rec2PM-A	14.87	44.13	61.16	28.50	32.31	14.87	44.13	61.16	28.50	32.31

Rec2PM 在与全序列基线相比达到相近或更高的准确性，同时显著降低时延和存储需求。
记忆充当去噪信息瓶颈，过滤随机噪声并提升在长期历史上的泛化能力。
通过自指教师强制实现的并行训练稳定了学习过程，并在串行训练的令牌记忆或 KV-cache 基线之上取得更好表现。
在大多数 setting 下，覆盖式记忆更新优于追加式更新，支撑更强的瓶颈效应。
Rec2PM 即使仅使用 4 个记忆槽也能保持出色性能，且随槽数增加呈可预期扩展。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。