QUICK REVIEW

[논문 리뷰] CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology

Zeeshan Rasheed, Sami, Malik Abdul|arXiv (Cornell University)|2024. 02. 02.

Software Engineering Research인용 수 5

한 줄 요약

CodePori는 대규모 및 복잡한 소프트웨어 프로젝트를 위해 실행 코드를 자율적으로 생성하는 LLM 기반 다중 에이전트 프레임워크를 제시하며, HumanEval 및 MBPP에서 강한 pass@1 성과 및 실무자 지원을 평가합니다.

ABSTRACT

Context: Large Language Models (LLMs) and Generative Pre-trained Transformers (GPTs) have transformed the field of Software Engineering (SE). Existing LLM-based multi-agent models have successfully addressed basic dialogue tasks. However, the potential of LLMs for more challenging tasks, such as automated code generation for large and complex projects, has been investigated in only a few existing works. Objective: This paper aims to investigate the potential of LLM-based agents in the software industry, particularly in enhancing productivity and reducing time-to-market for complex software solutions. Our primary objective is to gain insights into how these agents can fundamentally transform the development of large-scale software. Methods: We introduce CodePori, a novel system designed to automate code generation for large and complex software projects based on functional and non-functional requirements defined by stakeholders. To assess the proposed system performance, we utilized the HumanEval benchmark and manually tested the CodePori model, providing 20 different project descriptions as input and then evaluated the code accuracy by manually executing the code. Results: CodePori is able to generate running code for large-scale projects, aligned with the typical software development process. The HumanEval benchmark results indicate that CodePori improves code accuracy by 89%. A manual assessment conducted by the first author shows that the CodePori system achieved an accuracy rate of 85%. Conclusion: Based on the results, our conclusion is that proposed system demonstrates the transformative potential of LLM-based agents in SE, highlighting their practical applications and opening new opportunities for broader adoption in both industry and academia. Our project is publicly available at https://github.com/GPT-Laboratory/CodePori.

연구 동기 및 목표

대규모, 복잡한 프로젝트의 소프트웨어 개발 자동화를 다중 에이전트 LLM 시스템을 사용해 촉진한다.
자연어 프롬프트로부터 코드를 생성, 검토, 검증 및 테스트하기 위해 전문 에이전트들이 어떻게 협력하는지 보여준다.
CodePori를 확립된 벤치마크 및 실무자 피드백과 대조해 정확도, 효율성, 실용성을 평가한다.

제안 방법

디자인, 개발, 검토, 검증, 테스트에 특화된 에이전트가 있는 다중 에이전트 프레임워크를 제안한다.
매니저 에이전트를 사용해 고수준 설명을 모듈식 작업으로 분해한다.
임베딩과 LLM API(GPT-4/DaVinci 등)를 통한 통합 커뮤니케이션 프로토콜을 활용해 코드를 생성 및 정제한다.
HumanEval 및 MBPP 벤치마크를 pass@k 지표로 평가하고 MetaGPT, ChatDev, AlphaCode, Incoder, CodeGeeX, Codex, PaLM 등의 모델과 비교한다.
현실 세계의 사용성 및 성능을 평가하기 위해 일곱 명의 실무자를 참여시킨다.

실험 결과

연구 질문

RQ1RQ1: LLM 기반 다중 에이전트 모델이 대규모이고 복잡한 프로젝트에서 어떻게 코드를 생성하는가?
RQ2RQ2: 제안 모델의 코드 정확도 및 효율성이 기존 모델과 어떻게 비교되는가?

주요 결과

ID	실무자의 역할	경험(년)	전반 성능	피드백	제안
P1	소프트웨어 엔지니어	5	우수	복잡한 모델의 처리에 깊은 인상을 받음.	특정 시나리오 처리 강화.
P2	인공지능 연구원	7	매우 좋음	코드 정확도와 효율성을 확인함.	모델의 맥락 이해도 향상.
P3	수석 개발자	10	좋음	매끄러운 코드 통합을 칭찬함.	코드 최적화에 집중.
P4	데이터 사이언티스트	4	좋음	코드 기능에 만족함.	더 많은 커스터마이징 옵션 필요.
P5	소프트웨어 아키텍트	12	보통	도메인 특정 작업의 한계를 언급함.	전문화된 모듈 생성 제안.
P6	머신 러닝 엔지니어	6	매우 좋음	코드의 명확성과 유지 관리가 칭찬받음.	오류 처리 능력 향상.
P7	IT 프로젝트 매니저	8	좋음	소액 조정 필요.	모델의 확장성 증가.

CodePori는 HumanEval에서 87.5% pass@1, MBPP에서 86.5%를 달성하여 여러 기존 모델을 능가한다.
실무자 평가에서 CodePori의 성능에 대해 전반적으로 91%의 만족도를 보였다.
CodePori는 1000줄이 넘는 프로젝트의 코드를 생성할 수 있으며 개발 사이클을 20분 내에 완료하고 비용은 약 1달러이다.
메타GPT, ChatDev, AlphaCode, Incoder, CodeGeeX, Codex, PaLM 등과 비교할 때 벤치마크에서 코드 정확도와 효율성이 우수하다.
이 접근 방식은 전문 에이전트(개발, 검토, 검증, 테스트) 간의 협업을 개선하여 대규모 소프트웨어 산출물(예: 1000줄 이상)을 생성하는 데 도움이 된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.