QUICK REVIEW

[论文解读] Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering

Yucheng Li|arXiv (Cornell University)|Apr 24, 2023

Topic Modeling被引用 10

一句话总结

引入 Selective Context，一种基于自信息的内容筛选方法，用于压缩 LLM 的上下文，在最小任务性能损失下提升效率。

ABSTRACT

Large language models (LLMs) have received significant attention by achieving remarkable performance across various tasks. However, their fixed context length poses challenges when processing long documents or maintaining extended conversations. This paper proposes a method called extit{Selective Context} that employs self-information to filter out less informative content, thereby enhancing the efficiency of the fixed context length. We demonstrate the effectiveness of our approach on tasks of summarisation and question answering across different data sources, including academic papers, news articles, and conversation transcripts.

研究动机与目标

激励并解决 LLM 在处理长文档和扩展对话时的固定上下文长度限制。
提出基于自信息的内容筛选方法，有选择地保留信息丰富的词汇单元。
证明选择性上下文在各任务和数据来源中能显著降低上下文规模，而生成质量几乎不受损失。
在摘要、问答、原始上下文重建以及对话任务方面提供广泛评估。

提出的方法

使用基础语言模型（如 GPT-2/OPT/LLaMA 等因果语言模型）计算令牌级自信息。
通过自信息的可相加性将令牌自信息合并为词汇单元（句子、短语）。
按自信息对词汇单元进行排序，并应用分位点筛选保留信息丰富的单元。
从自信息高于第 p 个百分位的单元构建筛选后的上下文。
在多数据集和任务上以不同的减少比率（0.2–0.8）评估性能。

实验结果

研究问题

RQ1基于自信息的选择性筛选在降低上下文规模的同时是否能保留任务性能？
RQ2选择性上下文在不同数据源（arXiv、BBC News、ShareGPT）和任务（摘要、QA、重建、对话）的有效性差异有多大？
RQ3在不同词汇单元粒度（token/短语/句子）下，上下文缩减比与生成质量之间的权衡是什么？
RQ4基于分位点的保留能否比固定阈值或前 k 个选择更自适应地在效率和准确性之间取得平衡？

主要发现

方法	任务	BLEU	METEOR	rouge1	rouge2	rougeL	Precision	Recall	F1
Original	Summarisation	.274	.481	.570	.321	.416	.912	.911	.911
Original	QA	.529	.664	.690	.581	.664	.941	.939	.940
Original	Conversation	.238	.343	.451	.249	.332	.878	.878	.877
SC-0.2	Summarisation	.251 (.02)	.475 (.01)	.563 (.01)	.305 (.02)	.402 (.01)	.910 (.002)	.909 (.002)	.909 (.002)
SC-0.2	QA	.426 (.10)	.601 (.06)	.638 (.05)	.502 (.08)	.605 (.06)	.933 (.008)	.929 (.010)	.931 (.009)
SC-0.2	Conversation	.208 (.03)	.305 (.04)	.419 (.03)	.230 (.02)	.307 (.02)	.873 (.005)	.862 (.015)	.867 (.010)
SC-0.35	Summarisation	.212 (.06)	.442 (.04)	.533 (.04)	.265 (.06)	.363 (.05)	.905 (.007)	.902 (.009)	.903 (.008)
SC-0.35	QA	.337 (.19)	.531 (.13)	.578 (.11)	.420 (.16)	.539 (.13)	.925 (.017)	.918 (.021)	.921 (.019)
SC-0.35	Conversation	.179 (.06)	.290 (.05)	.400 (.05)	.198 (.05)	.285 (.05)	.871 (.007)	.861 (.016)	.866 (.012)
SC-0.5	Summarisation	.170 (.10)	.397 (.08)	.500 (.07)	.226 (.10)	.331 (.09)	.900 (.012)	.893 (.018)	.896 (.015)
SC-0.5	QA	.237 (.29)	.434 (.23)	.487 (.20)	.321 (.26)	.447 (.22)	.912 (.029)	.903 (.036)	.907 (.033)
SC-0.5	Conversation	.132 (.11)	.254 (.09)	.360 (.09)	.163 (.09)	.254 (.08)	.867 (.012)	.850 (.028)	.858 (.020)
SC-0.65	Summarisation	.114 (.16)	.335 (.15)	.447 (.12)	.168 (.15)	.281 (.13)	.893 (.019)	.880 (.031)	.886 (.025)
SC-0.65	QA	.157 (.37)	.336 (.33)	.394 (.30)	.227 (.35)	.353 (.31)	.899 (.042)	.888 (.051)	.893 (.047)
SC-0.65	Conversation	.109 (.13)	.227 (.12)	.331 (.12)	.139 (.11)	.225 (.11)	.864 (.014)	.843 (.034)	.853 (.024)
SC-0.8	Summarisation	.063 (.21)	.259 (.22)	.380 (.19)	.114 (.21)	.231 (.19)	.884 (.028)	.863 (.048)	.873 (.038)
SC-0.8	QA	.117 (.41)	.272 (.39)	.326 (.36)	.172 (.41)	.289 (.37)	.890 (.051)	.876 (.063)	.883 (.057)
SC-0.8	Conversation	.030 (.21)	.142 (.20)	.227 (.22)	.081 (.17)	.154 (.18)	.849 (.029)	.816 (.061)	.832 (.046)

Selective Context 在各任务中实现显著的上下文缩减（例如常在35% 左右，且质量损失较小）。
较低的减少率（0.2–0.35）在摘要和 QA 上的性能下降很小，BLEU/ROUGE 和 BERTScore 保持较高。
当减少比超过0.5时，QA 和重建任务的性能下降更明显，而摘要和对话任务更鲁棒。
与随机筛选相比，选择性上下文在中等缩减下更有效地保留信息，并保持更高的 ROUGE-1 和 BERTScore。
观测到数据源相关的最优阈值（arXiv：0.35–0.5；BBC/新闻：0.5–0.65；ShareGPT： varies），对话任务在高达80% 的缩减下仍具鲁棒性。
总体而言，Selective Context 在许多场景下显著提升上下文效率，同时带来适度的性能牺牲。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。