QUICK REVIEW

[논문 리뷰] Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering

Yucheng Li|arXiv (Cornell University)|2023. 04. 24.

Topic Modeling인용 수 10

한 줄 요약

Introduces Selective Context, a self-information-based content filtering method to compress context for LLMs, improving efficiency with minimal task performance loss.

ABSTRACT

Large language models (LLMs) have received significant attention by achieving remarkable performance across various tasks. However, their fixed context length poses challenges when processing long documents or maintaining extended conversations. This paper proposes a method called extit{Selective Context} that employs self-information to filter out less informative content, thereby enhancing the efficiency of the fixed context length. We demonstrate the effectiveness of our approach on tasks of summarisation and question answering across different data sources, including academic papers, news articles, and conversation transcripts.

연구 동기 및 목표

Motivate and address the fixed context length limitation of LLMs for long documents and extended conversations.
Propose a self-information based content filtering method to selectively retain informative lexical units.
Demonstrate that selective context can significantly reduce context size with minimal loss in generation quality across tasks and data sources.
Provide extensive evaluation across summarisation, QA, original context reconstruction, and conversation tasks.

제안 방법

Compute token-level self-information using a base language model (causal LM like GPT-2/OPT/LLaMA).
Merge token self-information into lexical units (sentences, phrases) via additivity of self-information.
Rank lexical units by self-information and apply percentile-based filtering to retain informative units.
Construct a filtered context from units with self-information above the p-th percentile.
Evaluate performance on multiple datasets and tasks with varying reduction ratios (0.2–0.8).

실험 결과

연구 질문

RQ1Does self-information-based selective filtering preserve task performance while reducing context size?
RQ2How does selective context vary in effectiveness across data sources (arXiv, BBC News, ShareGPT) and tasks (summarisation, QA, reconstruction, conversation)?
RQ3What is the trade-off between context reduction ratio and generation quality across different lexical-unit granularities (token/phrase/sentence)?
RQ4Can percentile-based retention adaptively balance efficiency and accuracy better than fixed thresholds or top-k selections?

주요 결과

Method	Task	BLEU	METEOR	rouge1	rouge2	rougeL	Precision	Recall	F1
Original	Summarisation	.274	.481	.570	.321	.416	.912	.911	.911
Original	QA	.529	.664	.690	.581	.664	.941	.939	.940
Original	Conversation	.238	.343	.451	.249	.332	.878	.878	.877
SC-0.2	Summarisation	.251 (.02)	.475 (.01)	.563 (.01)	.305 (.02)	.402 (.01)	.910 (.002)	.909 (.002)	.909 (.002)
SC-0.2	QA	.426 (.10)	.601 (.06)	.638 (.05)	.502 (.08)	.605 (.06)	.933 (.008)	.929 (.010)	.931 (.009)
SC-0.2	Conversation	.208 (.03)	.305 (.04)	.419 (.03)	.230 (.02)	.307 (.02)	.873 (.005)	.862 (.015)	.867 (.010)
SC-0.35	Summarisation	.212 (.06)	.442 (.04)	.533 (.04)	.265 (.06)	.363 (.05)	.905 (.007)	.902 (.009)	.903 (.008)
SC-0.35	QA	.337 (.19)	.531 (.13)	.578 (.11)	.420 (.16)	.539 (.13)	.925 (.017)	.918 (.021)	.921 (.019)
SC-0.35	Conversation	.179 (.06)	.290 (.05)	.400 (.05)	.198 (.05)	.285 (.05)	.871 (.007)	.861 (.016)	.866 (.012)
SC-0.5	Summarisation	.170 (.10)	.397 (.08)	.500 (.07)	.226 (.10)	.331 (.09)	.900 (.012)	.893 (.018)	.896 (.015)
SC-0.5	QA	.237 (.29)	.434 (.23)	.487 (.20)	.321 (.26)	.447 (.22)	.912 (.029)	.903 (.036)	.907 (.033)
SC-0.5	Conversation	.132 (.11)	.254 (.09)	.360 (.09)	.163 (.09)	.254 (.08)	.867 (.012)	.850 (.028)	.858 (.020)
SC-0.65	Summarisation	.114 (.16)	.335 (.15)	.447 (.12)	.168 (.15)	.281 (.13)	.893 (.019)	.880 (.031)	.886 (.025)
SC-0.65	QA	.157 (.37)	.336 (.33)	.394 (.30)	.227 (.35)	.353 (.31)	.899 (.042)	.888 (.051)	.893 (.047)
SC-0.65	Conversation	.109 (.13)	.227 (.12)	.331 (.12)	.139 (.11)	.225 (.11)	.864 (.014)	.843 (.034)	.853 (.024)
SC-0.8	Summarisation	.063 (.21)	.259 (.22)	.380 (.19)	.114 (.21)	.231 (.19)	.884 (.028)	.863 (.048)	.873 (.038)
SC-0.8	QA	.117 (.41)	.272 (.39)	.326 (.36)	.172 (.41)	.289 (.37)	.890 (.051)	.876 (.063)	.883 (.057)
SC-0.8	Conversation	.030 (.21)	.142 (.20)	.227 (.22)	.081 (.17)	.154 (.18)	.849 (.029)	.816 (.061)	.832 (.046)

Selective Context achieves substantial context reduction (e.g., 35% often with minor quality loss) across tasks.
Lower reduction (0.2–0.35) yields minimal performance drop on summarisation and QA, with BLEU/ROUGE and BERTScore remaining high.
Performance degrades more for QA and reconstruction tasks as reduction ratios exceed 0.5, while summarisation and conversation are more robust.
Compared to random filtering, selective context more effectively preserves information and maintains higher ROUGE-1 and BERTScore at moderate reductions.
Data-source dependent optimal thresholds observed (arXiv: 0.35–0.5; BBC/news: 0.5–0.65; ShareGPT: varies), and conversation tasks show robustness up to 80% reduction.
Overall, Selective Context significantly improves context efficiency with only modest performance sacrifices for many settings.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.