QUICK REVIEW

[논문 리뷰] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

Sirui Hong, Zhuge, Mingchen|arXiv (Cornell University)|2023. 08. 01.

Multi-Agent Systems and Negotiation인용 수 130

한 줄 요약

MetaGPT은 SOP-주도형 구조화된 출력과 실행 가능한 피드백을 활용하여 다중 에이전트 LLM 협업을 조정하는 메타-프로그래밍 프레임워크를 도입합니다. 벤치마크에서 최첨단 코드 생성과 견고한 소프트웨어 개발을 달성합니다.

ABSTRACT

Remarkable progress has been made on automated problem solving through societies of agents based on large language models (LLMs). Existing LLM-based multi-agent systems can already solve simple dialogue tasks. Solutions to more complex tasks, however, are complicated through logic inconsistencies due to cascading hallucinations caused by naively chaining LLMs. Here we introduce MetaGPT, an innovative meta-programming framework incorporating efficient human workflows into LLM-based multi-agent collaborations. MetaGPT encodes Standardized Operating Procedures (SOPs) into prompt sequences for more streamlined workflows, thus allowing agents with human-like domain expertise to verify intermediate results and reduce errors. MetaGPT utilizes an assembly line paradigm to assign diverse roles to various agents, efficiently breaking down complex tasks into subtasks involving many agents working together. On collaborative software engineering benchmarks, MetaGPT generates more coherent solutions than previous chat-based multi-agent systems. Our project can be found at https://github.com/geekan/MetaGPT

연구 동기 및 목표

표준화된 작동 절차(SOPs)를 도입하여 LLM 기반 다중 에이전트 문제 해결에서 일관성과 정확성을 향상시키려는 동기를 부여한다.
복잡한 소프트웨어 작업을 역할과 워크플로로 분해하여 에이전트 간 연쇄적 헛소리(hallucinations)를 줄인다.
문서/다이어그램과 같은 구조화된 출력과 퍼블리시-구독 정보 흐름을 가능하게 하여 의사소통 효율을 개선한다.
실행 중에 디버깅하고 코드를 실행할 수 있는 실행 가능한 피드백 메커니즘을 도입하여 더 높은 품질의 코드 생성을 달성한다.

제안 방법

작업별 프로필과 제약이 있는 다섯 가지 특수화된 에이전트 역할(Product Manager, Architect, Project Manager, Engineer, QA Engineer)을 정의한다.
공유 메시지 풀과 구독 메커니즘이 있는 구조화된 문서 기반 커뮤니케이션 프로토콜을 구현하여 정보 과부하를 줄인다.
요구사항에서 설계, 구현, 테스트까지의 작업 진행을 순차화하는 SOP 주도 소프트웨어 개발 워크플로를 채택한다.
엔지니어가 단위 테스트를 실행하고 제한된 재시도 한도(최대 3회 재시도) 내에서 코드를 반복적으로 디버깅하는 실행 가능한 피드백을 도입한다.
HumanEval, MBPP 및 새로운 SoftwareDev 벤치마크에서 Pass@k 지표와 사람/시스템 수준 평가를 사용하여 AutoGPT, LangChain, AgentVerse, ChatDev와 비교 평가한다.

Figure 1: The software development SOPs between MetaGPT and real-world human teams. In software engineering, SOPs promote collaboration among various roles. MetaGPT showcases its ability to decompose complex tasks into specific actionable procedures assigned to various roles (e.g., Product Manager,

실험 결과

연구 질문

RQ1SOP 및 역할 전문화의 도입이 다중 에이전트 코드 생성에서 일관성과 오류율에 어떤 영향을 미치는가?
RQ2퍼블리시-구독 구조화된 출력 커뮤니케이션 프로토콜이 LLM 기반 협업의 작업 효율성을 높이고 헛소리를 줄일 수 있는가?
RQ3런타임 중 실행 가능한 피드백이 표준 벤치마크에서 코드 품질과 실행 가능성을 크게 향상시키는가?
RQ4기존 다중 에이전트 프레임워크 및 일반 LLM과 비교하여 표준 코드 생성 벤치마크에서 MetaGPT의 성능은 어떠한가?

주요 결과

MetaGPT는 HumanEval 및 MBPP에서 최첨단 Pass@1 점수를 달성하여 각각 85.9%와 87.7%에 도달했다.
MetaGPT는 실험에서 100%의 작업 완료율을 달성했다.
SoftwareDev 벤치마크에서 MetaGPT는 대부분의 지표에서 ChatDev를 능가하며 실행 가능성 점수는 3.75이고 실행 시간은 더 짧다(503s).
MetaGPT는 전체적으로 더 많은 토큰을 사용하지만(24,613 또는 31,255), 인간 수정 비용은 더 낮고(0.83), 구조화된 SOP로 더 높은 실행 가능성을 달성한다.
실행 가능한 피드백은 HumanEval에서 4.2%, MBPP에서 5.4%의 절대적 개선을 Pass@1에 가져온다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.