QUICK REVIEW

[论文解读] Hierarchical Temporal Convolutional Networks for Dynamic Recommender Systems

Jiaxuan You, Yichen Wang|arXiv (Cornell University)|Apr 8, 2019

Recommender Systems and Techniques参考文献 40被引用 23

一句话总结

本文提出分层时序卷积网络（HierTCN），一种两级深度学习架构，结合RNN以建模跨会话的长期用户兴趣，以及时序卷积网络（TCN）以捕捉会话内的短期动态，从而实现实时、可扩展且准确的动态推荐。HierTCN在保持18%更高的召回率和10%更高的平均倒数排名（MRR）的同时，训练速度比最先进方法快2.5倍，内存使用量减少90%。

ABSTRACT

Recommender systems that can learn from cross-session data to dynamically predict the next item a user will choose are crucial for online platforms. However, existing approaches often use out-of-the-box sequence models which are limited by speed and memory consumption, are often infeasible for production environments, and usually do not incorporate cross-session information, which is crucial for effective recommendations. Here we propose Hierarchical Temporal Convolutional Networks (HierTCN), a hierarchical deep learning architecture that makes dynamic recommendations based on users' sequential multi-session interactions with items. HierTCN is designed for web-scale systems with billions of items and hundreds of millions of users. It consists of two levels of models: The high-level model uses Recurrent Neural Networks (RNN) to aggregate users' evolving long-term interests across different sessions, while the low-level model is implemented with Temporal Convolutional Networks (TCN), utilizing both the long-term interests and the short-term interactions within sessions to predict the next interaction. We conduct extensive experiments on a public XING dataset and a large-scale Pinterest dataset that contains 6 million users with 1.6 billion interactions. We show that HierTCN is 2.5x faster than RNN-based models and uses 90% less data memory compared to TCN-based models. We further develop an effective data caching scheme and a queue-based mini-batch generator, enabling our model to be trained within 24 hours on a single GPU. Our model consistently outperforms state-of-the-art dynamic recommendation methods, with up to 18% improvement in recall and 10% in mean reciprocal rank.

研究动机与目标

为解决现有序列模型在大规模动态推荐系统中的局限性，包括高内存占用、训练缓慢以及跨会话建模能力差的问题。
设计一种可扩展、可投入生产的架构，以高效捕捉跨会话的长期用户兴趣和会话内的短期行为。
在包含数十亿次交互和数亿用户的大型数据集上，实现实时、网络规模的推荐。
在真实世界数据集上，超越现有的基于RNN和CNN的模型，在准确性和效率方面均表现更优。

提出的方法

HierTCN采用分层两级架构：高层RNN模型用于编码跨多个会话演化的长期用户兴趣。
低层模型使用时序卷积网络（TCN）处理会话内的短期交互，并将其与长期表征结合，用于动态预测。
模型采用基于队列的小批量生成器和高效的数据缓存方案，实现在单张GPU上24小时内完成训练。
采用带负采样的合页损失以提升排序性能，并使用批量归一化和dropout以稳定训练并减少过拟合。
该框架支持数百万用户和物品的联合建模，实现可扩展的离线训练和在线推理。
在TCN中利用因果卷积实现自回归预测，并通过空洞卷积和局部感受野确保计算效率。

实验结果

研究问题

RQ1分层深度学习模型能否有效捕捉动态推荐中的长期跨会话用户兴趣和短期会话内动态？
RQ2在大规模真实场景下，HierTCN相较于基于RNN和CNN的模型在性能和效率方面表现如何？
RQ3不同损失函数和正则化技术对模型泛化能力和收敛性有何影响？
RQ4模型性能如何随历史交互数量和会话间时间间隔的变化而变化？
RQ5所提出的架构能否在包含数十亿次交互和数百万用户的生产环境中实现可扩展？

主要发现

在包含17亿次交互的大规模Pinterest数据集上，HierTCN相比最先进方法，召回率最高提升18%，平均倒数排名（MRR）提升10%。
与基于TCN的模型相比，该模型训练速度提升2.5倍，数据内存使用量减少90%，可在单张GPU上24小时内完成训练。
采用带负采样的合页损失相比L2损失，将Recall@1提升20%，MRR提升10%，优于基于NCE的目标函数。
仅使用批量归一化即可提升性能并加速收敛，而结合dropout可进一步提升性能并有效缓解过拟合。
随着历史交互数量增加和会话间时间间隔缩短，模型性能持续提升，表明其在用户行为模式上具有强大的泛化能力。
可视化结果表明，HierTCN能有效平衡多样化兴趣（如食物与家具），而基于规则和单层模型则过度拟合于主导品类。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。