QUICK REVIEW

[论文解读] Posthoc Interpretability of Learning to Rank Models using Secondary Training Data

Jaspreet Singh, Avishek Anand|arXiv (Cornell University)|Jun 29, 2018

Explainable Artificial Intelligence (XAI)参考文献 6被引用 34

一句话总结

本文提出一种事后、与模型无关的方法，通过在从黑箱排序器预测结果生成的二次训练数据上训练可解释的树模型，来解释已训练好的学习排序（LTR）模型。仅使用可解释的内容特征，该方法在原始模型上表现出高相关性，尤其在列表学习（listwise learning）设置下表现更优，表明即使特征仅为原始特征的子集，通过足够多的二次数据，仍可实现忠实且全局的解释。

ABSTRACT

Predictive models are omnipresent in automated and assisted decision making scenarios. But for the most part they are used as black boxes which output a prediction without understanding partially or even completely how different features influence the model prediction avoiding algorithmic transparency. Rankings are ordering over items encoding implicit comparisons typically learned using a family of features using learning-to-rank models. In this paper we focus on how best we can understand the decisions made by a ranker in a post-hoc model agnostic manner. We operate on the notion of interpretability based on explainability of rankings over an interpretable feature space. Furthermore we train a tree based model (inherently interpretable) using labels from the ranker, called secondary training data to provide explanations. Consequently, we attempt to study how well does a subset of features, potentially interpretable, explain the full model under different training sizes and algorithms. We do experiments on the learning to rank datasets with 30k queries and report results that serve show in certain settings we can learn a faithful interpretable ranker.

研究动机与目标

在无法访问其训练数据的情况下，实现对黑箱学习排序模型的事后可解释性。
探究是否可通过更简单的可解释模型忠实复现复杂预训练LTR模型的排序行为。
评估训练数据规模、学习算法类型（成对 vs. 列表学习）以及特征子集选择对可解释性保真度的影响。
使用基于内容的特征，提供可操作且人类可理解的排序决策解释。

提出的方法

通过在大量测试查询-文档对上收集预训练黑箱LTR模型的预测结果（即排序），生成二次训练数据。
仅使用被认为可解释的特征子集（如词项存在性、元数据等），在该二次数据上训练一个新的、本质可解释的树模型（如梯度提升树）。
使用标准LTR评估指标——NDCG、Precision@10、Kendall’s tau (τ) 和 τ@10——衡量可解释模型与原始模型排序结果的接近程度。
分别使用成对学习和列表学习的目标函数训练可解释模型，以比较不同学习范式下的性能表现。
系统性地改变二次训练集的规模，以研究数据效率与泛化能力。
通过在不同数据划分和特征集下测量原始模型与可解释模型排序结果之间的相关性，评估可解释性保真度。

实验结果

研究问题

RQ1RQ I：增加二次训练数据量是否能提高可解释模型对基排序器的保真度？
RQ2RQ II：用于基排序器的不同训练算法（成对 vs. 列表学习）如何影响可解释模型的性能？
RQ3RQ III：仅使用可解释特征子集的全局可解释模型，能在多大程度上模拟原始基排序器的行为？

主要发现

当使用15,000个查询时，基于列表学习基排序器生成的二次数据训练的可解释模型，在Kendall’s tau (τ) 达到0.49，τ@10 达到0.74，表明相关性中等但随数据量增加而持续提升。
对于成对训练的基模型，即使二次数据量较小（如400个查询），可解释模型也实现了高保真度，τ@10 达0.33，Precision@10 达0.5535。
在小样本情况下，基于成对训练基排序器输出训练的可解释模型与原始模型的相关性最高。
随着训练数据规模增加，基排序器采用列表学习时，可解释模型的τ和τ@10表现出更一致的提升，表明其泛化能力更强。
尽管仅使用可解释的内容特征，当基模型为成对训练时，可解释模型在Precision上几乎与基模型持平。
结果表明，仅依赖内容特征在复现复杂LTR模型方面表现不佳，即使使用大规模二次数据集，也凸显了仅依赖可解释特征的挑战。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。