QUICK REVIEW

[论文解读] How Well Does Generative Recommendation Generalize?

Yijie Ding, Zitian Guo|arXiv (Cornell University)|Mar 20, 2026

Recommender Systems and Techniques被引用 0

一句话总结

生成式推荐（GR）模型在处理与泛化相关的实例方面通常优于基于项目ID的模型，而基于项ID的模型在记忆方面表现出色；令牌级记忆解释了GR大部分泛化现象，且自适应集成可提升整体性能。

ABSTRACT

A widely held hypothesis for why generative recommendation (GR) models outperform conventional item ID-based models is that they generalize better. However, there is few systematic way to verify this hypothesis beyond a superficial comparison of overall performance. To address this gap, we categorize each data instance based on the specific capability required for a correct prediction: either memorization (reusing item transition patterns observed during training) or generalization (composing known patterns to predict unseen item transitions). Extensive experiments show that GR models perform better on instances that require generalization, whereas item ID-based models perform better when memorization is more important. To explain this divergence, we shift the analysis from the item level to the token level and show that what appears to be item-level generalization often reduces to token-level memorization for GR models. Finally, we show that the two paradigms are complementary. We propose a simple memorization-aware indicator that adaptively combines them on a per-instance basis, leading to improved overall recommendation performance.

研究动机与目标

研究GR模型是否在超越总体性能的情况下比传统的项ID模型具有更好的泛化能力。
基于项目转移模式将测试实例分为记忆化与泛化两类。
分析令牌级转移（前缀记忆）如何解释GR模型的项级泛化。
在多个真实世界数据集上评估GR与项ID模型，以量化类别特定表现。
提出一个面向记忆的集成策略，按实例将GR与项ID模型结合使用。

提出的方法

将记忆化定义为在训练数据中观察到的1跳项转移 [i_{t-1} -> i_t]。
用1跳和多跳增长类别（传递性、对称性、二阶对称性、可替换性）定义泛化。
基准两种模型：TIGER（GR，语义ID）与 SASRec（基于项ID）。
将测试数据划分为记忆化、泛化和未分类子集，并在每个子集上比较性能。
引入令牌级前缀n-gram记忆框架，通过记忆化解释项级泛化。
使用前缀n-gram计数和语义ID配置研究令牌级记忆与泛化之间的相关性。
提出一个自适应集成，在每个实例上使用基于MSP的记忆指示器加权TIGER与SASRec。

Figure 1 : Illustrated definitions for memorization vs. generalization. We define memorization and different sub-categories of generalization based on (1) the transition patterns observed in training data, and (2) the patterns required to infer.

实验结果

研究问题

RQ1GR模型在需要泛化的数据实例上是否优于项ID模型，而在记忆化实例上是否表现不佳？
RQ2GR中的项级泛化是否可由语义ID内的令牌级记忆来解释？
RQ3不同的泛化类型（传递性、对称性、可替换性）和跳数如何影响模型表现？
RQ4利用记忆指示器的自适应集成是否能提升整体推荐准确性？

主要发现

在七个真实世界数据集的泛化相关子集上，GR模型总体优于SASRec。
SASRec在记忆相关子集上优于TIGER，显示两种范式的互补优势。
大多数测试实例依赖于泛化而非记忆，未分类的情况<10%。
GR的项级泛化很大程度上归结为语义ID内的令牌级前缀记忆。
增加令牌记忆比率可以提升泛化，但可能削弱项级记忆。
基于MSP的自适应集成在实例层面对TIGER和SASRec加权后，整体性能提升。

Figure 2 : Illustration of multi-hop generalization.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。