QUICK REVIEW

[論文レビュー] LooComp: Leverage Leave-One-Out Strategy to Encoder-only Transformer for Efficient Query-aware Context Compression

Thao Do, DINH PHU TRAN|arXiv (Cornell University)|Mar 10, 2026

Topic Modeling被引用数 0

ひとこと要約

LooComp は encoder のみのモデルを用いた leave-one-out delta スコアリング機構を採用し、RAG のクエリ認識型文脈圧縮を実現。高速かつメモリ効率の良い圧縮で強力な QA パフォーマンスを達成。

ABSTRACT

Efficient context compression is crucial for improving the accuracy and scalability of question answering. For the efficiency of Retrieval Augmented Generation, context should be delivered fast, compact, and precise to ensure clue sufficiency and budget-friendly LLM reader cost. We propose a margin-based framework for query-driven context pruning, which identifies sentences that are critical for answering a query by measuring changes in clue richness when they are omitted. The model is trained with a composite ranking loss that enforces large margins for critical sentences while keeping non-critical ones near neutral. Built on a lightweight encoder-only Transformer, our approach generally achieves strong exact-match and F1 scores with high-throughput inference and lower memory requirements than those of major baselines. In addition to efficiency, our method yields effective compression ratios without degrading answering performance, demonstrating its potential as a lightweight and practical alternative for retrieval-augmented tasks.

研究の動機と目的

効率的な検索後の生成（RAG）を、回答品質を損なわずに文脈サイズを削減して動機づける。
クエリの関連性に導かれた軽量な encoder-のみの文削除法を提案する。
回答性への文の寄与を定量化する delta ベースのスコアリング機構（leave-one-out）を導入する。
クエリごとに圧縮率を自動的に決定する適応的なギャップベースの選択規則を開発する。
提案手法の効率性と有効性を示すために複数の QA ベンチマークで評価する。

提案手法

圧縮タスクを抽出的な文選択として表現し、元のテキストの忠実性を保つ。
retrieved documents を文に分割し、各文を省略したときの clue-richness delta（Delta_k）を軽量エンコーダ（ModernBERT）で計算する。
ランキング項目（L_ord、L_crit、L_non）と BCE 項を含む複合マージン損失で訓練し、重要文にはより大きなマージンを課す。
推論時にはすべての p0 および p_k スコアを並列計算し、Delta を導出し、適応的なギャップベースの閾値 tau を適用して文を critical / non-critical に分類する。
適応的閾値設定手順を用いる：正の Delta を並べ替え、連続 Delta 値間の最大ギャップを見つけて tau を設定し、文を選択する。
異なるバックボーンサイズ（ModernBERT-large/base）とリーダー（Llama 系、Gemini など）を試し、HotpotQA、2WikiMultihopQA、Musique、Natural Questions、TriviaQA で評価する。

Figure 1: Answering performance (EM, F1) and compression efficiency (QpS, Saved %) across compressors. Questions Per Second (QpS) is from compression latency; Context Saved is $100\%$ – Compression ratio.

実験結果

リサーチクエスチョン

RQ1 encoder のみのモデルは leave-one-out 機構を介して RAG 設定で文レベルの関連性を正確に識別できるか。
RQ2 マージンベースの適応ギャップ閾値は、QA ベンチマーク全体で文選択の効率と回答忠実度を改善するか。
RQ3 LOO-Delta スコアリングはデコーダー基盤やトークンレベルの剪定法と比べて、速度、メモリ、圧縮比の点でどうか。
RQ4 文レベルの剪定が適応圧縮で、オープンソースのリーダーやさまざまな top-k retrieval 深度で競争力があるか。
RQ5 バックボーンのサイズと訓練目的が QA パフォーマンスと圧縮効率に与える影響は何か。

主な発見

本手法は強力な exact-match および F1 スコアを達成しつつ、高速推論と主要なベースラインより低いメモリ使用を実現する。
適応的ギャップベースの選択は、QA パフォーマンスを低下させることなく、文脈を大幅に削減する効率的な圧縮をもたらす。
encoder-のみのバックボーン（例: ModernBERT）はこのフレームワークで文レベルの関連性分類に十分で、デコーダー基盤の手法より効率的である。
5 つの QA ベンチマークと複数のリーダーで、異なるデータセットと retrieval 深度に一般化し、競争力の高いまたは優れた QA 指標と高速な圧縮時間を維持する。
アブレーション研究は、完全なマージンベース損失（BCE 成分を含む）が最良の性能に必須であり、適応的推論戦略は一般化において固定マージン規則を上回ることを示す。

Figure 2: Overview of our framework. Our proposed lightweight context pruner includes three steps. (1) each retrieved document is segmented into sentences. (2) We measure the importance of sentences by calculating the change in clue richness, denoted as $\Delta$ , when a sentence is omitted. A large

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。