QUICK REVIEW

[논문 리뷰] Agentless: Demystifying LLM-based Software Engineering Agents

Chunqiu Steven Xia, Yinlin Deng|arXiv (Cornell University)|2024. 07. 01.

Multi-Agent Systems and Negotiation인용 수 13

한 줄 요약

Agentless는 LLM과 함께 SWE-bench Lite 문제를 해결하기 위한 에이전트 없는 두 단계 접근법(로컬라이제이션과 수리)을 제시하며, 낮은 비용으로 경쟁력 있는 성능을 달성하고 벤치마크의 이슈를 강조합니다.

ABSTRACT

Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practitioners have developed various autonomous LLM agents to perform end-to-end software development tasks. These agents are equipped with the ability to use tools, run commands, observe feedback from the environment, and plan for future actions. However, the complexity of these agent-based approaches, together with the limited abilities of current LLMs, raises the following question: Do we really have to employ complex autonomous software agents? To attempt to answer this question, we build Agentless -- an agentless approach to automatically solve software development problems. Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic three-phase process of localization, repair, and patch validation, without letting the LLM decide future actions or operate with complex tools. Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance (32.00%, 96 correct fixes) and low cost ($0.70) compared with all existing open-source software agents! Furthermore, we manually classified the problems in SWE-bench Lite and found problems with exact ground truth patch or insufficient/misleading issue descriptions. As such, we construct SWE-bench Lite-S by excluding such problematic issues to perform more rigorous evaluation and comparison. Our work highlights the current overlooked potential of a simple, interpretable technique in autonomous software development. We hope Agentless will help reset the baseline, starting point, and horizon for autonomous software agents, and inspire future work along this crucial direction.

연구 동기 및 목표

LLM 기반 소프트웨어 엔지니어링 작업에 대해 복잡한 자율 에이전트가 필요한지 여부를 제기한다.
엔드-투-엔드 버그 수정 및 기능 추가를 위한 간단한 에이전트 없는 두 단계 프레임워크(로컬라이제이션과 수리)를 제안한다.
기존의 오픈 소스 및 상용 에이전트와의 성능 및 비용 비교를 위해 SWE-bench Lite에서 이 접근법을 평가한다.
SWE-bench Lite의 한계를 분석하고 SWE-bench Lite-S를 보다 엄격한 벤치마크로 제안한다.

제안 방법

두 단계 워크플로우: 로컬라이제이션에 이어 수리.
로컬라이제이션: (a) 저장소 구조 표현을 구축하고, (b) 상위-N 의심 파일을 식별하고, (c) 클래스/함수 선언을 포함하는 각 파일의 뼈대를 도출하고, (d) 정확한 편집 위치까지 좁혀 가는 계층적 프로세스.
수리: 각 편집 위치마다 코드 주위의 맥락 창을 구성하고, LLM을 사용해 여러 패치 후보를 생성한 뒤 구문 검사 및 회귀 테스트를 통해 필터링한다.
패치는 편집 범위를 최소화하고 환상(허위) 위험을 줄이기 위해 간단한 Search/Replace 차이(diff) 형식으로 생성된다.
패치 평가는 회귀 테스트를 사용해 실패하는 패치를 걸러낸 뒤, 정규화된 패치에 대한 다수결 투표로 제출용 최종 패치를 선택한다.

실험 결과

연구 질문

RQ1에이전트가 아닌 비-에이전트 기반의 두 단계 접근법이 저장소 수준의 소프트웨어 엔지니어링 문제를 해결하는 데 복잡한 자율 에이전트 시스템과 맞먹거나 능가할 수 있는가?
RQ2SWE-bench Lite에서 에이전트 없는 설계와 에이전트 기반 접근 방식 간의 비용-성과 trade-off는 무엇인가?
RQ3계층적 로컬라이제이션이 편집 위치의 정밀도와 전체 패치 품질에 어떤 영향을 미치는가?
RQ4SWE-bench Lite에 존재하는 자율 소프트웨어 엔지니어링 도구 평가에 영향을 주는 문제점은 무엇이며, 개정된 벤치마크(SWE-bench Lite-S)가 엄격성을 어떻게 향상시킬 수 있는가?

주요 결과

Agentless는 SWE-bench Lite에서 27.33% 해결(82/300 문제) 및 버당 평균 비용 $0.34로 달성하여 비용 측면에서 오픈 소스 에이전트를 능가하고 성공률 면에서도 경쟁력을 보였다.
계층적 로컬라이제이션은 맥락을 축소하고 로컬라이제이션 정확도를 유지하며, 기준 파일의 77.7%가 로컬라이즈되고 이후 단계에서 점차 더 좁은 맥락을 제공한다.
수리 구성은 점진적 이득을 보인다: 단일 샘플 패치는 $0.11에 70개의 올바른 수정, 다수 샘플과 다수결 투표로 78개의 수정이 $0.34에, 테스트 필터링이 적용된 전체 처리로 82개의 수정이 생성된다(보고된 Agentless 결과).
정확한 정답 패치가 있는 문제, 오해의 소지가 있는 설명, 또는 충분한 문제 정보가 없는 문제를 제거해 252문제로 구성된 부분집합 SWE-bench Lite-S를 제안한다; 이 하위집합에서 Agentless는 순위 면에서도 경쟁력을 유지한다.
상세 분석은 SWE-bench Lite의 설명 품질, 제시된 해법 및 위치 정보와 관련된 이슈를 드러내며, 개선된 벤치마크 설계의 필요성을 촉구한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.