Skip to main content
QUICK REVIEW

[论文解读] Multi-Level Deep Cascade Trees for Conversion Rate Prediction.

Hong Wen, Jing Zhang|arXiv (Cornell University)|May 24, 2018
Recommender Systems and Techniques参考文献 30被引用 3
一句话总结

本文提出了一种新型梯度提升决策树集成方法——多级深度级联树(ldcTree),通过将GBDT堆叠以利用上一级的交叉熵输出作为下一级的输入特征,学习分层特征表示。该方法通过深度级联结构与集成学习提升转化率预测性能,在离线数据集和线上部署中均达到最先进水平。

ABSTRACT

Developing effective and efficient recommendation methods is very challenging for modern e-commerce platforms. Generally speaking, two essential modules named Click-Through Rate Prediction ( extit{CTR}) and Conversion Rate Prediction ( extit{CVR}) are included, where extit{CVR} module is a crucial factor that affects the final purchasing volume directly. However, it is indeed very challenging due to its sparseness nature. In this paper, we tackle this problem by proposing multi-Level Deep Cascade Trees ( extit{ldcTree}), which is a novel decision tree ensemble approach. It leverages deep cascade structures by stacking Gradient Boosting Decision Trees ( extit{GBDT}) to effectively learn feature representation. In addition, we propose to utilize the cross-entropy in each tree of the preceding extit{GBDT} as the input feature representation for next level extit{GBDT}, which has a clear explanation, i.e., a traversal from root to leaf nodes in the next level extit{GBDT} corresponds to the combination of certain traversals in the preceding extit{GBDT}. The deep cascade structure and the combination rule enable the proposed extit{ldcTree} to have a stronger distributed feature representation ability. Moreover, inspired by ensemble learning, we propose an Ensemble extit{ldcTree} ( extit{E-ldcTree}) to encourage the model's diversity and enhance the representation ability further. Finally, we propose an improved Feature learning method based on extit{EldcTree} ( extit{F-EldcTree}) for taking adequate use of weak and strong correlation features identified by pre-trained extit{GBDT} models. Experimental results on off-line data set and online deployment demonstrate the effectiveness of the proposed methods.

研究动机与目标

  • 为解决电商推荐系统中稀疏转化率(CVR)预测的挑战。
  • 通过堆叠多级梯度提升决策树(GBDT)增强分布式特征表示。
  • 通过跨层级的集成学习与特征重组,提升模型泛化能力与表征能力。
  • 开发一种特征学习方法,有效利用预训练GBDT模型识别出的强相关与弱相关特征。

提出的方法

  • 所提出的ldcTree采用深度级联结构,将每一级GBDT的交叉熵概率输出作为下一级GBDT的输入特征。
  • 后续GBDT层级中从根到叶的每条路径对应前一级路径的组合,实现分层特征组合。
  • 引入集成ldcTree(E-ldcTree),通过组合多个ldcTree实例提升模型多样性并改善泛化能力。
  • 提出基于E-ldcTree的特征学习方法(F-EldcTree),以利用预训练GBDT模型中识别出的强相关与弱相关特征。
  • 模型通过梯度提升迭代优化预测结果,同时借助树结构保持可解释性。

实验结果

研究问题

  • RQ1GBDT的深度级联结构是否能提升稀疏电商环境下的转化率预测的特征表示能力?
  • RQ2将前一级GBDT的交叉熵输出作为下一级输入,如何提升模型性能?
  • RQ3在多个ldcTree实例上进行集成学习,能在多大程度上提升预测的鲁棒性与准确性?
  • RQ4在分层树形框架中,能否有效利用强相关与弱相关特征以提升CVR预测性能?

主要发现

  • 所提出的ldcTree模型在离线数据集上的转化率预测性能优于基线方法。
  • 深度级联结构通过组合多级GBDT的路径,实现了更强的分布式特征表示。
  • E-ldcTree集成变体通过协同学习进一步增强了模型多样性与预测准确性。
  • F-EldcTree特征学习方法成功利用了强相关与弱相关特征,提升了模型泛化能力。
  • 模型在实际线上部署中表现出强大有效性,表明其在电商推荐系统中具备真实应用场景。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。