QUICK REVIEW

[论文解读] Reproducing and Comparing Distillation Techniques for Cross-Encoders

Victor Morand, Mathias Vast|arXiv (Cornell University)|Mar 3, 2026

Advanced Neural Network Applications被引用 0

一句话总结

本文复现了跨编码器重排序器的两种蒸馏策略并在受控设置下对九种编码器骨架进行基准测试，结果显示相对比较目标通常优于点式损失，强的目标可以弥补较小骨架的不足。

ABSTRACT

Recent advances in Information Retrieval have established transformer-based cross-encoders as a keystone in IR. Recent studies have focused on knowledge distillation and showed that, with the right strategy, traditional cross-encoders could reach the level of effectiveness of LLM re-rankers. Yet, comparisons with previous training strategies, including distillation from strong cross-encoder teachers, remain unclear. In addition, few studies cover a similar range of backbone encoders, while substantial improvements have been made in this area since BERT. This lack of comprehensive studies in controlled environments makes it difficult to identify robust design choices. In this work, we reproduce \citet{schlattRankDistiLLMClosingEffectiveness2025} LLM-based distillation strategy and compare it to \citet{hofstatterImprovingEfficientNeural2020} approach based on an ensemble of cross-encoder teachers, as well as other supervised objectives, to fine-tune a large range of cross-encoders, from the original BERT and its follow-ups RoBERTa, ELECTRA and DeBERTa-v3, to the more recent ModernBERT. We evaluate all models on both in-domain (TREC-DL and MS~MARCO dev) and out-of-domain datasets (BEIR, LoTTE, and Robust04). Our results show that objectives emphasizing relative comparisons -- pairwise MarginMSE and listwise InfoNCE -- consistently outperform pointwise baselines across all backbones and evaluation settings, and that objective choice can yield gains comparable to scaling the backbone architecture.

研究动机与目标

在信息检索中推动对跨编码器训练策略的稳健、受控比较。
在统一评测协议下解耦训练目标与编码器骨架的影响。
在跨编码器中复现关键蒸馏策略（MarginMSE 和 Rank-DistiLLM），并与监督损失进行比较。
在同领域与跨领域数据集上评估性能以评估泛化性。
提供可复现的配置和基准，为未来的跨编码器蒸馏研究提供支撑。

提出的方法

复现 Hofstätter 等人（2020）提出的 MarginMSE 蒸馏，利用跨编码器教师集成来引导跨编码器。
复现 Schlatt 等人（2025）基于 Rank-DistiLLM 的蒸馏（DistillRankNET 与 ADR-MSE），采用排序列表监督。
扩展评估至九种编码器骨架，覆盖 BERT、RoBERTa、ELECTRA、DeBERTaV3 与 ModernBERT 系列。
将蒸馏目标与监督损失进行对比：BCE（点式）、hinge（成对）、InfoNCE（列表式）。
标准化候选生成（由 SPLADE-v3-DistilBERT 检索的前 1000 个）以及在 ID/OOD 基准上的评估。
采用统一的训练协议（相同数据、相同预处理、优化器与评估），以隔离目标与骨架的影响。

实验结果

研究问题

RQ1蒸馏基础的监督信号（MarginMSE、DistillRankNet、ADR-MSE）与传统监督损失在跨编码器重排序器中的表现有何差异？
RQ2在域迁移下，编码器骨架选择与训练目标在影响排序性能方面的交互程度如何？
RQ3强蒸馏目标是否能弥补较小骨架的不足，达到接近较大模型的性能？
RQ4基于大型语言模型的蒸馏方法在不同骨架和评估设置下是否普遍有益，还是取决于数据集与骨架？
RQ5受控、统一的评估是否能揭示稳健跨编码器训练的一致设计选择？

主要发现

训练目标的选择对不同骨架和评估设置具有一致且显著的影响。
InfoNCE 与 MarginMSE 往往位居前列，而 BCE 在所测试的目标中表现最差。
骨架规模提升带来增益，但强的目标在 OOD 评估中尤为能与规模提升相比拟。
基于大型语言模型教师的列表式蒸馏方法（DistillRankNET、ADR-MSE）具有竞争力，但并非在所有骨架上都优于其他方法。
在受控环境下，监督目标可以使跨编码器在蒸馏方法面前具备竞争力，挑战了蒸馏始终优越的主张。

(b) Evaluation on BEIR-13 (semi OOD in our setup)

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。