[論文レビュー] Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering
Selective Context を導入し、自己情報に基づく内容フィルタリング手法で LLM の文脈を圧縮し、最小限のタスク性能低下で効率を向上させる。
Large language models (LLMs) have received significant attention by achieving remarkable performance across various tasks. However, their fixed context length poses challenges when processing long documents or maintaining extended conversations. This paper proposes a method called extit{Selective Context} that employs self-information to filter out less informative content, thereby enhancing the efficiency of the fixed context length. We demonstrate the effectiveness of our approach on tasks of summarisation and question answering across different data sources, including academic papers, news articles, and conversation transcripts.
研究の動機と目的
- Motivate and address the fixed context length limitation of LLMs for long documents and extended conversations.
- Propose a self-information based content filtering method to selectively retain informative lexical units.
- Demonstrate that selective context can significantly reduce context size with minimal loss in generation quality across tasks and data sources.
- Provide extensive evaluation across summarisation, QA, original context reconstruction, and conversation tasks.
提案手法
- Compute token-level self-information using a base language model (causal LM like GPT-2/OPT/LLaMA).
- Merge token self-information into lexical units (sentences, phrases) via additivity of self-information.
- Rank lexical units by self-information and apply percentile-based filtering to retain informative units.
- Construct a filtered context from units with self-information above the p-th percentile.
- Evaluate performance on multiple datasets and tasks with varying reduction ratios (0.2–0.8).
実験結果
リサーチクエスチョン
- RQ1Does self-information-based selective filtering preserve task performance while reducing context size?
- RQ2How does selective context vary in effectiveness across data sources (arXiv, BBC News, ShareGPT) and tasks (summarisation, QA, reconstruction, conversation)?
- RQ3What is the trade-off between context reduction ratio and generation quality across different lexical-unit granularities (token/phrase/sentence)?
- RQ4Can percentile-based retention adaptively balance efficiency and accuracy better than fixed thresholds or top-k selections?
主な発見
| Method | Task | BLEU | METEOR | rouge1 | rouge2 | rougeL | Precision | Recall | F1 |
|---|---|---|---|---|---|---|---|---|---|
| Original | Summarisation | .274 | .481 | .570 | .321 | .416 | .912 | .911 | .911 |
| Original | QA | .529 | .664 | .690 | .581 | .664 | .941 | .939 | .940 |
| Original | Conversation | .238 | .343 | .451 | .249 | .332 | .878 | .878 | .877 |
| SC-0.2 | Summarisation | .251 (.02) | .475 (.01) | .563 (.01) | .305 (.02) | .402 (.01) | .910 (.002) | .909 (.002) | .909 (.002) |
| SC-0.2 | QA | .426 (.10) | .601 (.06) | .638 (.05) | .502 (.08) | .605 (.06) | .933 (.008) | .929 (.010) | .931 (.009) |
| SC-0.2 | Conversation | .208 (.03) | .305 (.04) | .419 (.03) | .230 (.02) | .307 (.02) | .873 (.005) | .862 (.015) | .867 (.010) |
| SC-0.35 | Summarisation | .212 (.06) | .442 (.04) | .533 (.04) | .265 (.06) | .363 (.05) | .905 (.007) | .902 (.009) | .903 (.008) |
| SC-0.35 | QA | .337 (.19) | .531 (.13) | .578 (.11) | .420 (.16) | .539 (.13) | .925 (.017) | .918 (.021) | .921 (.019) |
| SC-0.35 | Conversation | .179 (.06) | .290 (.05) | .400 (.05) | .198 (.05) | .285 (.05) | .871 (.007) | .861 (.016) | .866 (.012) |
| SC-0.5 | Summarisation | .170 (.10) | .397 (.08) | .500 (.07) | .226 (.10) | .331 (.09) | .900 (.012) | .893 (.018) | .896 (.015) |
| SC-0.5 | QA | .237 (.29) | .434 (.23) | .487 (.20) | .321 (.26) | .447 (.22) | .912 (.029) | .903 (.036) | .907 (.033) |
| SC-0.5 | Conversation | .132 (.11) | .254 (.09) | .360 (.09) | .163 (.09) | .254 (.08) | .867 (.012) | .850 (.028) | .858 (.020) |
| SC-0.65 | Summarisation | .114 (.16) | .335 (.15) | .447 (.12) | .168 (.15) | .281 (.13) | .893 (.019) | .880 (.031) | .886 (.025) |
| SC-0.65 | QA | .157 (.37) | .336 (.33) | .394 (.30) | .227 (.35) | .353 (.31) | .899 (.042) | .888 (.051) | .893 (.047) |
| SC-0.65 | Conversation | .109 (.13) | .227 (.12) | .331 (.12) | .139 (.11) | .225 (.11) | .864 (.014) | .843 (.034) | .853 (.024) |
| SC-0.8 | Summarisation | .063 (.21) | .259 (.22) | .380 (.19) | .114 (.21) | .231 (.19) | .884 (.028) | .863 (.048) | .873 (.038) |
| SC-0.8 | QA | .117 (.41) | .272 (.39) | .326 (.36) | .172 (.41) | .289 (.37) | .890 (.051) | .876 (.063) | .883 (.057) |
| SC-0.8 | Conversation | .030 (.21) | .142 (.20) | .227 (.22) | .081 (.17) | .154 (.18) | .849 (.029) | .816 (.061) | .832 (.046) |
- Selective Context achieves substantial context reduction (e.g., 35% often with minor quality loss) across tasks.
- Lower reduction (0.2–0.35) yields minimal performance drop on summarisation and QA, with BLEU/ROUGE and BERTScore remaining high.
- Performance degrades more for QA and reconstruction tasks as reduction ratios exceed 0.5, while summarisation and conversation are more robust.
- Compared to random filtering, selective context more effectively preserves information and maintains higher ROUGE-1 and BERTScore at moderate reductions.
- Data-source dependent optimal thresholds observed (arXiv: 0.35–0.5; BBC/news: 0.5–0.65; ShareGPT: varies), and conversation tasks show robustness up to 80% reduction.
- Overall, Selective Context significantly improves context efficiency with only modest performance sacrifices for many settings.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。