Skip to main content
QUICK REVIEW

[论文解读] CodeR: Issue Resolving with Multi-Agent and Task Graphs

Dong Chen, Shaoxin Lin|arXiv (Cornell University)|Jun 3, 2024
Advanced Graph Neural Networks被引用 5
一句话总结

CodeR 引入了一个带有任务图规划的多代理框架,自动解决 GitHub 问题,在 SWE-bench lite 的一个提交中达到新纪录的 28.33% 成功率。

ABSTRACT

GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issues, when submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction.

研究动机与目标

  • Motivate and address the challenge of automated GitHub issue resolving at repository scale.
  • Propose a decoupled, multi-agent system guided by pre-defined task graphs to execute issue-resolution plans.
  • Leverage fault localization, code reproduction, and repository editing to improve patch quality and success rates.
  • Demonstrate that pre-planning with task graphs improves performance over on-the-fly decision-making.

提出的方法

  • Define five agent roles (Manager, Reproducer, Fault Localizer, Editor, Verifier) with specialized actions.
  • Introduce a JSON-formatted task-graph planning framework to predefine and strictly execute issue-resolving plans.
  • Use SBFL and BM25-based retrieval to perform multi-source fault localization and guide localization decisions.
  • Reuse and extend actions from SWE-agent and AutoCodeRover, adding new actions and role-specific permissions to enhance planning and execution.
  • Incorporate LLM-generated code reproduction for tests and code edits, and evaluate via SWE-bench lite benchmarks with one submission per issue.
Figure 1: Multi-Agent framework of CodeR with task graphs.
Figure 1: Multi-Agent framework of CodeR with task graphs.

实验结果

研究问题

  • RQ1Can a multi-agent framework with pre-defined task graphs improve reliability and performance in automated GitHub issue resolving?
  • RQ2Does integrating fault localization with plan-driven execution outperform reactive, single-agent approaches?
  • RQ3What is the impact of plan pre-design on patch correctness, cost, and success rate in SWE-bench lite?
  • RQ4How do different agent roles contribute to resolution effectiveness and resource usage?

主要发现

方法解决率(%)平均请求平均令牌/成本
CodeR (reported)28.33 (85)30.39299K/$3.09
CodeR (ours)27.33 (82)30.39299K/$3.09
SWE-agent + GPT-4 (reported)18.00 (54)
Aider (reported)26.33 (79)
AutoCodeRover19.00 (57)
Explicit Patch Generation (RAG + GPT-4)2.67 (8)
RAG + Claude 3 Opus4.33 (13)
RAG + SWE-Llama 7B1.33 (4)
RAG + GPT-3.50.33 (1)
RAG + Claude 23.00 (9)
  • CodeR achieves 28.33% of issues resolved on SWE-bench lite with one submission per issue (85/300).
  • Ablation shows removing multi-agent and task graph reduces resolution from 22% to 10%.
  • Combining BM25 with SBFL significantly improves fault localization precision over SBFL alone.
  • Plan-based, pre-defined task graphs yield better performance than on-the-fly planning across comparisons with SWE-agent, AutoCodeRover, and Aider.
  • Explicit patch generation underperforms compared with implicit patch generation via code repository edits.
  • CodeR’s ablation without fault localization reduces performance and increases cost, underscoring the value of integrated fault localization.
Figure 2: Task graphs in JSON format.
Figure 2: Task graphs in JSON format.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。