QUICK REVIEW

[论文解读] CodeR: Issue Resolving with Multi-Agent and Task Graphs

Dong Chen, Shaoxin Lin|arXiv (Cornell University)|Jun 3, 2024

Advanced Graph Neural Networks被引用 5

一句话总结

CodeR 引入了一个带有任务图规划的多代理框架，自动解决 GitHub 问题，在 SWE-bench lite 的一个提交中达到新纪录的 28.33% 成功率。

ABSTRACT

GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issues, when submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction.

研究动机与目标

Motivate and address the challenge of automated GitHub issue resolving at repository scale.
Propose a decoupled, multi-agent system guided by pre-defined task graphs to execute issue-resolution plans.
Leverage fault localization, code reproduction, and repository editing to improve patch quality and success rates.
Demonstrate that pre-planning with task graphs improves performance over on-the-fly decision-making.

提出的方法

Define five agent roles (Manager, Reproducer, Fault Localizer, Editor, Verifier) with specialized actions.
Introduce a JSON-formatted task-graph planning framework to predefine and strictly execute issue-resolving plans.
Use SBFL and BM25-based retrieval to perform multi-source fault localization and guide localization decisions.
Reuse and extend actions from SWE-agent and AutoCodeRover, adding new actions and role-specific permissions to enhance planning and execution.
Incorporate LLM-generated code reproduction for tests and code edits, and evaluate via SWE-bench lite benchmarks with one submission per issue.

Figure 1: Multi-Agent framework of CodeR with task graphs.

实验结果

研究问题

RQ1Can a multi-agent framework with pre-defined task graphs improve reliability and performance in automated GitHub issue resolving?
RQ2Does integrating fault localization with plan-driven execution outperform reactive, single-agent approaches?
RQ3What is the impact of plan pre-design on patch correctness, cost, and success rate in SWE-bench lite?
RQ4How do different agent roles contribute to resolution effectiveness and resource usage?

主要发现

方法	解决率（%）	平均请求	平均令牌/成本
CodeR (reported)	28.33 (85)	30.39	299K/$3.09
CodeR (ours)	27.33 (82)	30.39	299K/$3.09
SWE-agent + GPT-4 (reported)	18.00 (54)
Aider (reported)	26.33 (79)
AutoCodeRover	19.00 (57)
Explicit Patch Generation (RAG + GPT-4)	2.67 (8)
RAG + Claude 3 Opus	4.33 (13)
RAG + SWE-Llama 7B	1.33 (4)
RAG + GPT-3.5	0.33 (1)
RAG + Claude 2	3.00 (9)

CodeR achieves 28.33% of issues resolved on SWE-bench lite with one submission per issue (85/300).
Ablation shows removing multi-agent and task graph reduces resolution from 22% to 10%.
Combining BM25 with SBFL significantly improves fault localization precision over SBFL alone.
Plan-based, pre-defined task graphs yield better performance than on-the-fly planning across comparisons with SWE-agent, AutoCodeRover, and Aider.
Explicit patch generation underperforms compared with implicit patch generation via code repository edits.
CodeR’s ablation without fault localization reduces performance and increases cost, underscoring the value of integrated fault localization.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。