QUICK REVIEW

[论文解读] Self-Attentive Sequential Recommendation

Wang-Cheng Kang, Julian McAuley|arXiv (Cornell University)|Aug 20, 2018

Recommender Systems and Techniques参考文献 37被引用 84

一句话总结

SASRec 使用自注意力来建模用户行为序列以进行下一个项目推荐，在稀疏和密集数据集上实现强性能与效率。它自适应地权衡过去的行动以预测下一个项。

ABSTRACT

Sequential dynamics are a key feature of many modern recommender systems, which seek to capture the `context' of users' activities on the basis of actions they have performed recently. To capture such patterns, two approaches have proliferated: Markov Chains (MCs) and Recurrent Neural Networks (RNNs). Markov Chains assume that a user's next action can be predicted on the basis of just their last (or last few) actions, while RNNs in principle allow for longer-term semantics to be uncovered. Generally speaking, MC-based methods perform best in extremely sparse datasets, where model parsimony is critical, while RNNs perform better in denser datasets where higher model complexity is affordable. The goal of our work is to balance these two goals, by proposing a self-attention based sequential model (SASRec) that allows us to capture long-term semantics (like an RNN), but, using an attention mechanism, makes its predictions based on relatively few actions (like an MC). At each time step, SASRec seeks to identify which items are `relevant' from a user's action history, and use them to predict the next item. Extensive empirical studies show that our method outperforms various state-of-the-art sequential models (including MC/CNN/RNN-based approaches) on both sparse and dense datasets. Moreover, the model is an order of magnitude more efficient than comparable CNN/RNN-based models. Visualizations on attention weights also show how our model adaptively handles datasets with various density, and uncovers meaningful patterns in activity sequences.

研究动机与目标

激发序列推荐系统在长期语义与短期上下文之间取得平衡。
提出一个基于自注意的模型，选择性地关注相关的过去行动。
在 CNN/RNN 基方法上实现更高效的同时，获得强的预测性能。

提出的方法

将用户行动序列与项目嵌入和位置嵌入进行嵌入。
应用带因果遮蔽的堆叠自注意力块以捕捉过去项之间的依赖关系。
使用带残差连接和层归一化的前馈网络以提升稳定性与非线性。
通过最终嵌入与项目嵌入之间的矩阵分解风格的交互来预测下一个项目得分（或共享项目嵌入）。
使用二元交叉熵和负采样以及 Adam 优化器进行训练。

实验结果

研究问题

RQ1SASRec 是否在稀疏和密集数据集上超越最先进的序列推荐模型？
RQ2位置嵌入、注意力块和共享项目嵌入等组件如何影响性能？
RQ3随着序列长度增长，SASRec 的训练效率和可扩展性特征如何？
RQ4注意力头是否揭示与位置或项目属性相关的有意义模式？

主要发现

数据集	指标	PopRec	BPR	FMC	FPMC	TransRec	GRU4Rec	GRU4Rec+	Caser	SASRec
Beauty	Hit@10	0.4003	0.3775	0.3771	0.4310	0.4607	0.2125	0.3949	0.4264	0.4854
Beauty	NDCG@10	0.2277	0.2183	0.2477	0.2891	0.3020	0.1203	0.2556	0.2547	0.3219
Games	Hit@10	0.4724	0.4853	0.6358	0.6802	0.6838	0.2938	0.6599	0.5282	0.7410
Games	NDCG@10	0.2779	0.2875	0.4456	0.4680	0.4557	0.1837	0.4759	0.3214	0.5360
Steam	Hit@10	0.7172	0.7061	0.7731	0.7710	0.7624	0.4190	0.8018	0.7874	0.8729
Steam	NDCG@10	0.4535	0.4436	0.5193	0.5011	0.4852	0.2691	0.5595	0.5381	0.6306
ML-1M	Hit@10	0.4329	0.5781	0.6986	0.7599	0.6413	0.5581	0.7501	0.7886	0.8245
ML-1M	NDCG@10	0.2377	0.3287	0.4676	0.5176	0.3969	0.3381	0.5513	0.5538	0.5905

SASRec 在所有基线之上表现更优（包括 MC/CNN/RNN 变体），在稀疏和密集数据集上均如此。
由于可并行化的自注意力计算，该模型比基于 CNN/RNN 的方法高效显著。
注意力可视化揭示对相关过去行动的自适应聚焦，在密集数据上表现出更长距离的依赖，在稀疏数据上则关注最近行动。
两层自注意力块配合学习到的位置嵌入在中等训练时间内取得了强劲表现。
SASRec 可以被解释为一个灵活、适应性强的分层项相似性模型，用于下一个项推荐。
在不同数据集上，SASRec 相对于非神经和神经基线都有显著提升（具体增益在报告结果中总结）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。