[논문 리뷰] Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion
TILDEv2는 TILDE를 맥락화된 정확 용어 매칭과 구절 확장을 통해 CPU 전용 구절 재랭킹에서 최첨단 성능을 달성하도록 대체하고, 인덱스 크기를 최대 99%까지 축소하며 대기 시간은 100 ms 이하로 유지하면서 TILDE보다 효과를 개선합니다.
BERT-based information retrieval models are expensive, in both time (query latency) and computational resources (energy, hardware cost), making many of these models impractical especially under resource constraints. The reliance on a query encoder that only performs tokenization and on the pre-processing of passage representations at indexing, has allowed the recently proposed TILDE method to overcome the high query latency issue typical of BERT-based models. This however is at the expense of a lower effectiveness compared to other BERT-based re-rankers and dense retrievers. In addition, the original TILDE method is characterised by indexes with a very high memory footprint, as it expands each passage into the size of the BERT vocabulary. In this paper, we propose TILDEv2, a new model that stems from the original TILDE but that addresses its limitations. TILDEv2 relies on contextualized exact term matching with expanded passages. This requires to only store in the index the score of tokens that appear in the expanded passages (rather than all the vocabulary), thus producing indexes that are 99% smaller than those of TILDE. This matching mechanism also improves ranking effectiveness by 24%, without adding to the query latency. This makes TILDEv2 the state-of-the-art passage re-ranking method for CPU-only environments, capable of maintaining query latency below 100ms on commodity hardware.
연구 동기 및 목표
- Address the high query latency of BERT-based re-rankers by enabling CPU-friendly second-stage ranking.
- Reduce index memory footprint compared to TILDE without sacrificing effectiveness.
- Introduce contextualized exact term matching to replace query-likelihood matching.
- Propose a fast passage expansion method to mitigate vocabulary mismatch.
- Demonstrate state-of-the-art performance on MS MARCO and DL2019/2020 datasets under CPU constraints.
제안 방법
- Tokenizer-based query encoder that encodes queries into sparse, query-length feature vectors using the BERT tokenizer (no model inference at query time).
- Contextualized exact term matching where passage tokens are assigned scalar weights via a BERT-based projection, enabling exact term matching with the passage’s tokens.
- Use of Noise-contrastive Estimation (NCE) loss for training with negative samples (S(q,p+), S(q,p−)).
- Passage expansion at indexing time to mitigate vocabulary mismatch by appending semantically related tokens derived from a TILDE-based expansion (replacing docT5query).
- Expansion uses the original TILDE model to generate token likelihoods, selecting top-m tokens not in the passage or stopword list for expansion (Algorithm 1).
- Index stored as a lightweight structure containing only tokens present in passages with their max contextualized term weights (drastically reducing index size).
실험 결과
연구 질문
- RQ1RQ1: Is contextualized exact term matching more effective and efficient than the original TILDE’s query-likelihood matching?
- RQ2RQ2: How does TILDEv2 compare to baselines (BM25, docT5query, DeepImpact, uniCOIL, RepBERT, ANCE, EPIC, BERT-based re-rankers) in effectiveness and latency?
- RQ3RQ3: What is the effectiveness-efficiency trade-off of TILDEv2 relative to a strong BERT re-ranker under varying cut-offs?
- RQ4RQ4: How effective and efficient is the proposed passage expansion based on TILDE compared with docT5query?
주요 결과
| 방법 | MRR@10 | nDCG@10 | MAP | nDCG@10 (DL2019) | MAP (DL2020) | GPU | CPU | 지연 시간(ms) |
|---|---|---|---|---|---|---|---|---|
| TILDE+BM25-top1000 | 0.269 | 0.579 | 0.406 | 0.620 | 0.406 | n.a. | 76.6 | 76.6 |
| TILDE+d2q-top10 | 0.285 | 0.650 | 0.467 | 0.624 | 0.417 | n.a. | n.a. | 75.3 |
| TILDEv2+BM25-top1000 | 0.333 | 0.676 | 0.448 | 0.659 | 0.433 | n.a. | 80.8 | 80.8 |
| TILDEv2+d2q-top100 | 0.341 | 0.703 | 0.498 | 0.669 | 0.449 | n.a. | n.a. | 76.4 |
- Contextualized exact term matching in TILDEv2 yields higher effectiveness than TILDE’s query-likelihood matching, with up to 24% improvement on MS MARCO when re-ranking BM25 (and 20% when re-ranking docT5query).
- TILDEv2 maintains CPU-friendly latency (<100 ms) and adds only a few milliseconds to BM25 or docT5query pipelines, while achieving competitive effectiveness.
- TILDEv2 significantly reduces index size (up to 99% smaller than TILDE) by storing only passage tokens with max contextualized weights, instead of the full vocabulary.
- Passage expansion using the original TILDE (instead of docT5query) enables faster expansion (7.3 hours for MS MARCO with docT5query vs a fraction of that with TILDE-based expansion) and incurs less than 1% effectiveness loss.
- On MS MARCO and DL2019/DL2020, TILDEv2 is on par with or better than baselines in effectiveness while offering substantially lower latency, especially on CPU (≤80 ms).
- A three-stage pipeline (BM25 → TILDEv2 → BERT-large re-ranker) can achieve similar or better effectiveness with much lower latency than using BERT-large alone on top passages.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.