QUICK REVIEW
[论文解读] CodeR: Issue Resolving with Multi-Agent and Task Graphs
Dong Chen, Shaoxin Lin|arXiv (Cornell University)|Jun 3, 2024
Advanced Graph Neural Networks被引用 5
一句话总结
CodeR 引入了一个带有任务图规划的多代理框架,自动解决 GitHub 问题,在 SWE-bench lite 的一个提交中达到新纪录的 28.33% 成功率。
ABSTRACT
GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issues, when submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction.
研究动机与目标
- Motivate and address the challenge of automated GitHub issue resolving at repository scale.
- Propose a decoupled, multi-agent system guided by pre-defined task graphs to execute issue-resolution plans.
- Leverage fault localization, code reproduction, and repository editing to improve patch quality and success rates.
- Demonstrate that pre-planning with task graphs improves performance over on-the-fly decision-making.
提出的方法
- Define five agent roles (Manager, Reproducer, Fault Localizer, Editor, Verifier) with specialized actions.
- Introduce a JSON-formatted task-graph planning framework to predefine and strictly execute issue-resolving plans.
- Use SBFL and BM25-based retrieval to perform multi-source fault localization and guide localization decisions.
- Reuse and extend actions from SWE-agent and AutoCodeRover, adding new actions and role-specific permissions to enhance planning and execution.
- Incorporate LLM-generated code reproduction for tests and code edits, and evaluate via SWE-bench lite benchmarks with one submission per issue.

实验结果
研究问题
- RQ1Can a multi-agent framework with pre-defined task graphs improve reliability and performance in automated GitHub issue resolving?
- RQ2Does integrating fault localization with plan-driven execution outperform reactive, single-agent approaches?
- RQ3What is the impact of plan pre-design on patch correctness, cost, and success rate in SWE-bench lite?
- RQ4How do different agent roles contribute to resolution effectiveness and resource usage?
主要发现
| 方法 | 解决率(%) | 平均请求 | 平均令牌/成本 |
|---|---|---|---|
| CodeR (reported) | 28.33 (85) | 30.39 | 299K/$3.09 |
| CodeR (ours) | 27.33 (82) | 30.39 | 299K/$3.09 |
| SWE-agent + GPT-4 (reported) | 18.00 (54) | ||
| Aider (reported) | 26.33 (79) | ||
| AutoCodeRover | 19.00 (57) | ||
| Explicit Patch Generation (RAG + GPT-4) | 2.67 (8) | ||
| RAG + Claude 3 Opus | 4.33 (13) | ||
| RAG + SWE-Llama 7B | 1.33 (4) | ||
| RAG + GPT-3.5 | 0.33 (1) | ||
| RAG + Claude 2 | 3.00 (9) |
- CodeR achieves 28.33% of issues resolved on SWE-bench lite with one submission per issue (85/300).
- Ablation shows removing multi-agent and task graph reduces resolution from 22% to 10%.
- Combining BM25 with SBFL significantly improves fault localization precision over SBFL alone.
- Plan-based, pre-defined task graphs yield better performance than on-the-fly planning across comparisons with SWE-agent, AutoCodeRover, and Aider.
- Explicit patch generation underperforms compared with implicit patch generation via code repository edits.
- CodeR’s ablation without fault localization reduces performance and increases cost, underscoring the value of integrated fault localization.

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。