QUICK REVIEW

[论文解读] Understanding LLM Performance Degradation in Multi-Instance Processing: The Roles of Instance Count and Context Length

Jingxuan Chen, Mohammad Taher Pilehvar|arXiv (Cornell University)|Mar 23, 2026

Topic Modeling被引用 0

一句话总结

论文评估大型语言模型在多实例处理中的性能下降，显示出在小实例数时存在下降趋势，随着实例数增大出现崩溃，实例数对性能的影响强于上下文长度。

ABSTRACT

Users often rely on Large Language Models (LLMs) for processing multiple documents or performing analysis over a number of instances. For example, analysing the overall sentiment of a number of movie reviews requires an LLM to process the sentiment of each review individually in order to provide a final aggregated answer. While LLM performance on such individual tasks is generally high, there has been little research on how LLMs perform when dealing with multi-instance inputs. In this paper, we perform a comprehensive evaluation of the multi-instance processing (MIP) ability of LLMs for tasks in which they excel individually. The results show that all LLMs follow a pattern of slight performance degradation for small numbers of instances (approximately 20-100), followed by a performance collapse on larger instance counts. Crucially, our analysis shows that while context length is associated with this degradation, the number of instances has a stronger effect on the final results. This finding suggests that when optimising LLM performance for MIP, attention should be paid to both context length and, in particular, instance count.

研究动机与目标

激励并理解大型语言模型如何处理需要分析多份文档的多实例处理（MIP）任务。
表征随着实例数量增加，LLM的性能下降模式。
量化上下文长度相对于实例数对MIP性能的相对影响。

提出的方法

对多实例处理任务进行全面评估，其中每个实例先进行单独分析再进行聚合。
分析实例数从小规模到大规模增加时的性能趋势。
检验上下文长度与降解之间的关联，并将其影响与实例数进行比较。

实验结果

研究问题

RQ1在多实例处理任务中，实例数量增加时，LLM的性能如何变化？
RQ2在相对于实例数量的驱动下降中，上下文长度的作用是什么？
RQ3LLMs在不同模型和任务中是否呈现两阶段的降解模式（初期小幅下降，随后崩溃）？
RQ4哪一个因素更强预测最终的MIP性能：实例数量还是上下文长度？

主要发现

对少量实例（大约 20–100），LLMs 显示出轻微的性能下降模式。
在较大实例数量下，各模型的性能会崩溃。
上下文长度与下降相关，但实例数量对最终结果的影响更强。
在优化 MIP 的 LLM 性能时，关注上下文长度以及尤其是实例数量都很重要。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。