[Paper Review] Posthoc Interpretability of Learning to Rank Models using Secondary Training Data
This paper proposes a post-hoc, model-agnostic method to interpret trained Learning-to-Rank (LTR) models by training an interpretable tree-based model on secondary training data generated from the black-box ranker's predictions. Using only interpretable, content-based features, the method achieves high correlation with the original model—especially under listwise learning—demonstrating that faithful, global explanations are possible with sufficient secondary data, even when features are a subset of the original ones.
Predictive models are omnipresent in automated and assisted decision making scenarios. But for the most part they are used as black boxes which output a prediction without understanding partially or even completely how different features influence the model prediction avoiding algorithmic transparency. Rankings are ordering over items encoding implicit comparisons typically learned using a family of features using learning-to-rank models. In this paper we focus on how best we can understand the decisions made by a ranker in a post-hoc model agnostic manner. We operate on the notion of interpretability based on explainability of rankings over an interpretable feature space. Furthermore we train a tree based model (inherently interpretable) using labels from the ranker, called secondary training data to provide explanations. Consequently, we attempt to study how well does a subset of features, potentially interpretable, explain the full model under different training sizes and algorithms. We do experiments on the learning to rank datasets with 30k queries and report results that serve show in certain settings we can learn a faithful interpretable ranker.
Motivation & Objective
- To enable post-hoc interpretability of black-box Learning-to-Rank models without access to their training data.
- To investigate whether a simpler, interpretable model can faithfully reproduce the ranking behavior of a complex, pretrained LTR model.
- To evaluate the impact of training data size, learning algorithm type (pairwise vs. listwise), and feature subset selection on interpretability fidelity.
- To provide actionable, human-understandable explanations for ranking decisions using content-based features.
Proposed method
- Generate secondary training data by collecting predictions (rankings) from a pretrained black-box LTR model on a large set of test query-document pairs.
- Train a new, inherently interpretable tree-based model (e.g., gradient-boosted trees) on this secondary data, using only a subset of features deemed interpretable (e.g., term presence, metadata).
- Use standard LTR evaluation metrics—NDCG, Precision@10, Kendall’s tau (τ), and τ@10—to measure how closely the interpretable model replicates the original model’s rankings.
- Train the interpretable model using both pairwise and listwise learning objectives to compare performance across learning paradigms.
- Systematically vary the size of the secondary training set to study data efficiency and generalization.
- Assess interpretability fidelity by measuring correlation between the original model’s and the interpretable model’s rankings across different splits and feature sets.
Experimental results
Research questions
- RQ1RQ I: Does increasing the amount of secondary training data improve the fidelity of the interpretable model to the base ranker?
- RQ2RQ II: How do different training algorithms used for the base ranker (pairwise vs. listwise) affect the performance of the interpretable model?
- RQ3RQ III: How closely can a global, interpretable model mimic the behavior of the original base ranker using only a subset of interpretable features?
Key findings
- With 15k queries, the interpretable model achieved a Kendall’s tau (τ) of 0.49 and τ@10 of 0.74 when trained on secondary data from a listwise base ranker, indicating moderate but improving correlation with increasing data.
- For pairwise-trained base models, the interpretable model achieved high fidelity even with small secondary data (e.g., 400 queries), reaching τ@10 = 0.33 and Precision@10 = 0.5535.
- The interpretable model trained on pairwise base ranker outputs showed the highest correlation with the original model, especially when using a small number of secondary examples.
- Listwise learning of the base ranker led to more consistent improvements in τ and τ@10 as training data size increased, suggesting better generalization for the interpretable model.
- Despite using only interpretable, content-based features, the interpretable model achieved nearly the same Precision as the base ranker when the base model was pairwise-trained.
- The results indicate that content-based features alone perform poorly in reproducing complex LTR models, even with large secondary datasets, highlighting the challenge of relying solely on interpretable features.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.