[论文解读] CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology
CodePori 提供一个基于 LLM 的多代理框架,能够自动为大型且复杂的软件项目生成可运行的代码,在 HumanEval 和 MBPP 上获得强劲的 pass@1 得分,并得到从业者的支持。
Context: Large Language Models (LLMs) and Generative Pre-trained Transformers (GPTs) have transformed the field of Software Engineering (SE). Existing LLM-based multi-agent models have successfully addressed basic dialogue tasks. However, the potential of LLMs for more challenging tasks, such as automated code generation for large and complex projects, has been investigated in only a few existing works. Objective: This paper aims to investigate the potential of LLM-based agents in the software industry, particularly in enhancing productivity and reducing time-to-market for complex software solutions. Our primary objective is to gain insights into how these agents can fundamentally transform the development of large-scale software. Methods: We introduce CodePori, a novel system designed to automate code generation for large and complex software projects based on functional and non-functional requirements defined by stakeholders. To assess the proposed system performance, we utilized the HumanEval benchmark and manually tested the CodePori model, providing 20 different project descriptions as input and then evaluated the code accuracy by manually executing the code. Results: CodePori is able to generate running code for large-scale projects, aligned with the typical software development process. The HumanEval benchmark results indicate that CodePori improves code accuracy by 89%. A manual assessment conducted by the first author shows that the CodePori system achieved an accuracy rate of 85%. Conclusion: Based on the results, our conclusion is that proposed system demonstrates the transformative potential of LLM-based agents in SE, highlighting their practical applications and opening new opportunities for broader adoption in both industry and academia. Our project is publicly available at https://github.com/GPT-Laboratory/CodePori.
研究动机与目标
- 利用多代理 LLM 系统推动对大型、复杂项目的软件开发自动化。
- 展示专门代理如何协作从自然语言提示生成、审查、验证和测试代码。
- 针对既定基准和从业者反馈对 CodePori 进行评估,以评估准确性、效率与实用性。
提出的方法
- 提出一个多代理框架,让代理在设计、开发、审查、验证和测试方面各自专长。
- 使用管理代理将高层描述分解为代理的模块化任务。
- 通过嵌入和 LLM API(如 GPT-4/DaVinci)实施集成通信协议以生成和改进代码。
- 使用 HumanEval 和 MBPP 基准测试,采用 pass@k 指标进行评估,并与 MetaGPT、ChatDev、AlphaCode、Incoder、CodeGeeX、Codex 和 PaLM 等模型进行比较。
- 让七位从业者参与评估实际可用性和性能。
实验结果
研究问题
- RQ1RQ1:基于 LLM 的多代理模型如何为大型和复杂项目生成代码?
- RQ2RQ2:所提出模型的代码准确性和效率与现有模型相比如何?
主要发现
| ID | 从业者角色 | 经验(年) | 总体表现 | 反馈 | 建议 |
|---|---|---|---|---|---|
| P1 | Software Engineer | 5 | Excellent | Impressed with complex model’s handling. | Enhance handling of specific scenarios. |
| P2 | AI Researcher | 7 | Very Good | Found the code accuracy and efficiency. | Improve model’s contextual understanding. |
| P3 | Senior Developer | 10 | Good | Praised smooth code integration. | Focus on code optimization. |
| P4 | Data Scientist | 4 | Good | Satisfied with the code’s functionality. | Need more customization options. |
| P5 | Software Architect | 12 | Fair | Noted limitations in domain-specific tasks. | Suggest specialized module creation. |
| P6 | Machine Learning Engineer | 6 | Very Good | Praised code clarity and upkeep.. | Enhanced error handling capabilities. |
| P7 | IT Project Manager | 8 | Good | Need minor adjustments. | Increasing model’s scalability. |
- CodePori 在 HumanEval 上达到 87.5% 的 pass@1,在 MBPP 上达到 86.5%,超越了若干现有模型。
- 从业者评估显示对 CodePori 的总体满意度为 91%。
- CodePori 可以为超过 1000 行的项目生成代码,开发周期在不到 20 分钟内完成,成本约为一美元。
- 与 MetaGPT、ChatDev、AlphaCode、Incoder、CodeGeeX、Codex 和 PaLM 等模型相比,CodePori 在基准测试中展现出更高的代码准确性和效率。
- 该方法支持生成大规模软件产物(如 1000+ 行)并在专门代理(开发、审查、验证、测试)之间实现更好的协作。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。