QUICK REVIEW

[论文解读] Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

Thibaud Gloaguen, Niels Mündler|arXiv (Cornell University)|Feb 12, 2026

Software Engineering Research被引用 0

一句话总结

论文系统性地评估了仓库级别的上下文文件（AGENTS.md），发现开发者撰写的上下文文件只有边际的性能提升，而自动生成的上下文文件往往会降低性能并增加成本；上下文文件也驱动更多的探索和测试。

ABSTRACT

A widespread practice in software development is to tailor coding agents to repositories using context files, such as AGENTS.md, by either manually or automatically generating them. Although this practice is strongly encouraged by agent developers, there is currently no rigorous investigation into whether such context files are actually effective for real-world tasks. In this work, we study this question and evaluate coding agents' task completion performance in two complementary settings: established SWE-bench tasks from popular repositories, with LLM-generated context files following agent-developer recommendations, and a novel collection of issues from repositories containing developer-committed context files. Across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%. Behaviorally, both LLM-generated and developer-provided context files encourage broader exploration (e.g., more thorough testing and file traversal), and coding agents tend to respect their instructions. Ultimately, we conclude that unnecessary requirements from context files make tasks harder, and human-written context files should describe only minimal requirements.

研究动机与目标

评估仓库级上下文文件是否能提升自动编码任务的完成率。
创建 AGENTbench，以基准测试上下文文件对真实世界任务的影响。
在多种代理和提示下比较开发者提供的上下文文件与自动生成的上下文文件。
在包含上下文文件时，调查行为变化与成本影响。

提出的方法

以包含开发者撰写上下文文件的仓库中的真实 GitHub PR 为基础构建 AGENTbench。
在三种设定下（无、LLM 生成上下文、人工提供上下文）评估 SWE-bench Lite 与 AGENTbench 下的四种编码代理。
衡量成功率、解决步骤以及LLM 推理成本。
分析代理轨迹，理解探索、测试与推理的变化。

Figure 1 : Overview of our evaluation pipeline. We begin with real-world repositories and tasks derived from past pull requests. For each repository state, we generate three settings: \tiny{1}⃝ If a developer-provided context file exists, we include it in the repository. In \tiny{2}⃝, we omit the co

实验结果

研究问题

RQ1仓库级上下文文件是否提升真实任务中编码代理的成功率？
RQ2开发者提供的与自动生成的上下文文件如何影响代理行为与成本？
RQ3上下文文件是否提供有意义的仓库概览，帮助任务解决？
RQ4上下文文件对代理的测试与探索行为有何影响？

主要发现

相较于不提供仓库上下文，上下文文件往往降低任务成功率。
LLM 生成的上下文文件平均对性能有略微下降，并将推理成本提高超过 20%。
开发者提供的上下文文件在无上下文文件的基础上提供了边际的性能提升（平均约 4%）。
上下文文件增加探索、测试与推理，导致成本上升，但未提供清晰的概览收益。
当文档被移除时，LLM 生成的上下文文件可能优于开发者撰写的上下文，表明许多上下文文件段落在常见仓库中具有冗余。
上下文文件通常被代理遵循，但并不作为有效的仓库概览。

Figure 2 : Distribution of AGENTbench instances across 12 open-source GitHub repositories, each containing context files.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。