QUICK REVIEW

[论文解读] LooComp: Leverage Leave-One-Out Strategy to Encoder-only Transformer for Efficient Query-aware Context Compression

Thao Do, DINH PHU TRAN|arXiv (Cornell University)|Mar 10, 2026

Topic Modeling被引用 0

一句话总结

tldr: LooComp 使用一个留一法(delta)评分机制，结合仅编码器的模型来对检索式问答（RAG）进行查询感知的句子级上下文裁剪，从而在快速、内存高效的压缩下实现强 QA 性能。

ABSTRACT

Efficient context compression is crucial for improving the accuracy and scalability of question answering. For the efficiency of Retrieval Augmented Generation, context should be delivered fast, compact, and precise to ensure clue sufficiency and budget-friendly LLM reader cost. We propose a margin-based framework for query-driven context pruning, which identifies sentences that are critical for answering a query by measuring changes in clue richness when they are omitted. The model is trained with a composite ranking loss that enforces large margins for critical sentences while keeping non-critical ones near neutral. Built on a lightweight encoder-only Transformer, our approach generally achieves strong exact-match and F1 scores with high-throughput inference and lower memory requirements than those of major baselines. In addition to efficiency, our method yields effective compression ratios without degrading answering performance, demonstrating its potential as a lightweight and practical alternative for retrieval-augmented tasks.

研究动机与目标

Motivate efficient retrieval-augmented generation (RAG) by reducing context size without sacrificing answer quality.
Propose a lightweight, encoder-only sentence-pruning method guided by query relevance.
Introduce a delta-based scoring mechanism (leave-one-out) to quantify sentence contribution to answerability.
Develop an adaptive, gap-based selection rule to automatically determine compression rate per query.
Evaluate across multiple QA benchmarks to demonstrate efficiency and effectiveness of the proposed approach.

提出的方法

Represent the compression task as extractive sentence selection to preserve source text fidelity.
Segment retrieved documents into sentences and compute the clue-richness delta (Delta_k) when omitting each sentence using a lightweight encoder (ModernBERT).
Train with a composite margin-based loss that includes ranking terms (L_ord, L_crit, L_non) and BCE terms to handle clue-free passages, enforcing larger margins for critical sentences.
During inference, compute all p0 and p_k scores in parallel, derive Deltas, and apply an adaptive gap-based threshold tau to classify sentences as critical or non-critical.
Use an adaptive thresholding procedure: sort positive Deltas, find the maximum gap between consecutive Delta values, and set tau accordingly to select sentences.
Experiment with different backbone sizes (ModernBERT-large/base) and readers (Llama variants, Gemini, etc.) and evaluate on HotpotQA, 2WikiMultihopQA, Musique, Natural Questions, and TriviaQA.

Figure 1: Answering performance (EM, F1) and compression efficiency (QpS, Saved %) across compressors. Questions Per Second (QpS) is from compression latency; Context Saved is $100\%$ – Compression ratio.

实验结果

研究问题

RQ1Can an encoder-only model accurately identify sentence-level relevance for QA in a RAG setup via a leave-one-out mechanism?
RQ2Does a margin-based, adaptive gap threshold improve sentence selection efficiency and answer fidelity across QA benchmarks?
RQ3How does LOO-Delta scoring compare to decoder-based or token-level pruning methods in terms of speed, memory, and compression ratio?
RQ4Is sentence-level pruning with adaptive compression competitive across open-source readers and various top-k retrieval depths?
RQ5What is the effect of backbone size and training objectives on QA performance and compression efficiency?

主要发现

The method achieves strong exact-match and F1 scores while offering high-throughput inference and lower memory usage than major baselines.
Adaptive gap-based selection yields efficient compression with substantial context reduction without degrading QA performance.
Encoder-only backbones (e.g., ModernBERT) suffice for sentence-level relevance classification in this framework, offering efficiency advantages over decoder-based approaches.
Across five QA benchmarks and multiple readers, the approach generalizes well to different datasets and retrieval depths, maintaining competitive or superior QA metrics and faster compression times.
Ablation studies show the full margin-based loss (with BCE components) is essential for best performance, and adaptive inference strategies outperform fixed-margin rules in generalization.

Figure 2: Overview of our framework. Our proposed lightweight context pruner includes three steps. (1) each retrieved document is segmented into sentences. (2) We measure the importance of sentences by calculating the change in clue richness, denoted as $\Delta$ , when a sentence is omitted. A large

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。