QUICK REVIEW

[论文解读] An Evaluation of Context Length Extrapolation in Long Code via Positional Embeddings and Efficient Attention

Madhusudan Ghosh, Rishabh Gupta|arXiv (Cornell University)|Feb 25, 2026

Software Engineering Research被引用 0

一句话总结

论文比较了用于长代码上下文外推的无训练、仅推理方法，在长代码完成中评估位置编码（RoPE/ReRoPE）和高效注意力（Paged Attention、Flash Attention、StreamingLLM），覆盖 Python、Csharp、Java 的情境。它分析 EM 和 Edit Sim 结果以识别代码上下文外推的优势与权衡。

ABSTRACT

The rapid advancement of large language models (LLMs) has led to a significant increase in automated tools in the software engineering, capable of performing various code-related tasks such as code generation, completion, and translation. Despite these advancements, its effectiveness is constrained by fixed context lengths, limiting its ability to generalize across long, domain-specific code sequences. To address this challenge, we investigate zero-shot, inference-only methods aimed at improving position encodings and optimizing attention mechanisms. Our goal is to provide a thorough analysis of current approaches that facilitate context length extrapolation in code, particularly in the context of long code completion tasks.

研究动机与目标

评估推理阶段仅方法在零样本代码完成任务中对长代码序列的处理方式。
通过实证评估比较基于位置编码的外推与基于高效注意力的长代码处理机制。
识别在多种编程语言中，哪些方法能在长代码上下文中保留语法与结构。

提出的方法

将上下文长度外推方法分为基于位置编码的和基于高效注意力的两大类。
评估 RoPE、ReRoPE 及若干高效注意力技术（StreamingLLM、Paged Attention、Flash Attention）在长代码完成任务中的表现。
使用零样本推理和贪婪解码对长代码序列生成 100 个标记。
用精确匹配（EM）和编辑相似性（Edit Sim）指标衡量性能。
分析基于 Guo 等人（2023）数据集的 Python、Csharp、Java 的结果。
讨论局限性并提出混合方法与新评估指标的方向。

Figure 1. Comparison of length extrapolation techniques for code completion, categorized into Positional Encoding Based (e.g., RoPE, ReRoPE) and Efficient Attention Based methods (e.g., StreamingLLM, Paged Attention, Flash Attention). These approaches address the challenges of handling long code seq

实验结果

研究问题

RQ1RQ-1：高效注意力机制在跨语言的长代码长度外推性能上有何影响？
RQ2RQ-2：位置外推（RoPE、ReRoPE）相较于高效注意力方法在长代码外推中有何差异？
RQ3RQ-3：语言语法与结构（Python、Csharp、Java）如何影响零样本长代码完成的性能？

主要发现

高效注意力（Paged Attention）在多数语言上通常获得比位置外推方法更高的精确匹配（EM）分数（例如 Python：Paged Attn EM 0.377 vs RoPE 0.013）。
位置外推方法（尤其是 ReRoPE）在较长代码序列中获得更高的编辑相似性（Edit Sim）分数和更好的结构连贯性。
StreamingLLM 与 Flash Attention 展现出强烈的速度提升，但在零样本代码任务的长上下文外推上表现不稳定，在某些情况下 EM 与 Edit Sim 较低。
ReRoPE 在各语言中始终保持较高的 Edit Sim，表明在长代码完成中对语法与层级结构的保留更好。
EM 与 Edit Sim 之间存在显著差距，凸显需引入能同时捕捉功能正确性与代码质量的评估指标。
语言相关趋势显示 Python 常常在 Edit Sim 上高于 Java 或 Csharp，可能与 Java/Csharp 的语法刚性相比于 Python 的灵活性有关。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。