QUICK REVIEW

[论文解读] Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion

Shengyao Zhuang, Guido Zuccon|arXiv (Cornell University)|Aug 19, 2021

Topic Modeling参考文献 45被引用 48

一句话总结

TILDEv2 使用带上下文的精确术语匹配和段落扩展以实现仅CPU的最先进段落再排序，减少索引大小高达99%并在延迟保持在100 ms以下的同时提升相对于 TILDE 的有效性。

ABSTRACT

BERT-based information retrieval models are expensive, in both time (query latency) and computational resources (energy, hardware cost), making many of these models impractical especially under resource constraints. The reliance on a query encoder that only performs tokenization and on the pre-processing of passage representations at indexing, has allowed the recently proposed TILDE method to overcome the high query latency issue typical of BERT-based models. This however is at the expense of a lower effectiveness compared to other BERT-based re-rankers and dense retrievers. In addition, the original TILDE method is characterised by indexes with a very high memory footprint, as it expands each passage into the size of the BERT vocabulary. In this paper, we propose TILDEv2, a new model that stems from the original TILDE but that addresses its limitations. TILDEv2 relies on contextualized exact term matching with expanded passages. This requires to only store in the index the score of tokens that appear in the expanded passages (rather than all the vocabulary), thus producing indexes that are 99% smaller than those of TILDE. This matching mechanism also improves ranking effectiveness by 24%, without adding to the query latency. This makes TILDEv2 the state-of-the-art passage re-ranking method for CPU-only environments, capable of maintaining query latency below 100ms on commodity hardware.

研究动机与目标

Address the high query latency of BERT-based re-rankers by enabling CPU-friendly second-stage ranking.
Reduce index memory footprint compared to TILDE without sacrificing effectiveness.
Introduce contextualized exact term matching to replace query-likelihood matching.
Propose a fast passage expansion method to mitigate vocabulary mismatch.
Demonstrate state-of-the-art performance on MS MARCO and DL2019/2020 datasets under CPU constraints.

提出的方法

Tokenizer-based query encoder that encodes queries into sparse, query-length feature vectors using the BERT tokenizer (no model inference at query time).
Contextualized exact term matching where passage tokens are assigned scalar weights via a BERT-based projection, enabling exact term matching with the passage’s tokens.
Use of Noise-contrastive Estimation (NCE) loss for training with negative samples (S(q,p+), S(q,p−)).
Passage expansion at indexing time to mitigate vocabulary mismatch by appending semantically related tokens derived from a TILDE-based expansion (replacing docT5query).
Expansion uses the original TILDE model to generate token likelihoods, selecting top-m tokens not in the passage or stopword list for expansion (Algorithm 1).
Index stored as a lightweight structure containing only tokens present in passages with their max contextualized term weights (drastically reducing index size).

实验结果

研究问题

RQ1RQ1: Is contextualized exact term matching more effective and efficient than the original TILDE’s query-likelihood matching?
RQ2RQ2: How does TILDEv2 compare to baselines (BM25, docT5query, DeepImpact, uniCOIL, RepBERT, ANCE, EPIC, BERT-based re-rankers) in effectiveness and latency?
RQ3RQ3: What is the effectiveness-efficiency trade-off of TILDEv2 relative to a strong BERT re-ranker under varying cut-offs?
RQ4RQ4: How effective and efficient is the proposed passage expansion based on TILDE compared with docT5query?

主要发现

方法	MRR@10	nDCG@10	MAP	nDCG@10 (DL2019)	MAP (DL2020)	GPU	CPU	延迟（ms）
TILDE+BM25-top1000	0.269	0.579	0.406	0.620	0.406	n.a.	76.6	76.6
TILDE+d2q-top10	0.285	0.650	0.467	0.624	0.417	n.a.	n.a.	75.3
TILDEv2+BM25-top1000	0.333	0.676	0.448	0.659	0.433	n.a.	80.8	80.8
TILDEv2+d2q-top100	0.341	0.703	0.498	0.669	0.449	n.a.	n.a.	76.4

Contextualized exact term matching in TILDEv2 yields higher effectiveness than TILDE’s query-likelihood matching, with up to 24% improvement on MS MARCO when re-ranking BM25 (and 20% when re-ranking docT5query).
TILDEv2 maintains CPU-friendly latency (<100 ms) and adds only a few milliseconds to BM25 or docT5query pipelines, while achieving competitive effectiveness.
TILDEv2 significantly reduces index size (up to 99% smaller than TILDE) by storing only passage tokens with max contextualized weights, instead of the full vocabulary.
Passage expansion using the original TILDE (instead of docT5query) enables faster expansion (7.3 hours for MS MARCO with docT5query vs a fraction of that with TILDE-based expansion) and incurs less than 1% effectiveness loss.
On MS MARCO and DL2019/DL2020, TILDEv2 is on par with or better than baselines in effectiveness while offering substantially lower latency, especially on CPU (≤80 ms).
A three-stage pipeline (BM25 → TILDEv2 → BERT-large re-ranker) can achieve similar or better effectiveness with much lower latency than using BERT-large alone on top passages.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。