QUICK REVIEW

[논문 리뷰] RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

André Silva, Sen Fang|arXiv (Cornell University)|2023. 12. 25.

Software Testing and Debugging Techniques인용 수 16

한 줄 요약

RepairLLaMA는 Defects4J v2와 HumanEval-Java에서 다중 위치 버그를 포함하여 최첨단 수리 성능을 달성하기 위해 코드 특화 표현 및 LoRA 기반 매개변수 효율적 미세조정을 사용하는 수리 어댑터 접근법을 제시합니다.

ABSTRACT

Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tune LLMs with naive code representations and does not scale to frontier models. To address this problem, we propose RepairLLaMA, a novel program repair approach that 1) identifies optimal code representations for APR with fine-tuned models, and 2) pioneers state-of-the-art parameter-efficient fine-tuning technique (PEFT) for program repair. This results in RepairLLaMA producing a highly effective `program repair adapter' for fixing bugs with AI. Our experiments demonstrate the validity of both concepts. First, fine-tuning adapters with program repair specific code representations enables the model to use meaningful repair signals and produce better patches. Second, parameter-efficient fine-tuning helps fine-tuning to converge and clearly contributes to the effectiveness of RepairLLaMA in fixing bugs outside the fine-tuning data distribution. Overall, RepairLLaMA correctly fixes 144 Defects4J v2, 109 HumanEval-Java, and 20 GitBug-Java bugs, outperforming all baselines.

연구 동기 및 목표

도메인별 코드 표현을 활용하여 자동화된 프로그램 수리(APR)를 개선하려는 동기 부여.
fault localization 신호가 포함된 입력/출력 표현이 수리 성능에 미치는 영향을 조사합니다.
APR 맥락에서 LoRA 기반 매개변수 효율적 미세조정과 전체 미세조정의 차이를 평가합니다.
자바 버그 수정용으로 사전 학습된 LLM에 연결된 수리 어댑터의 효과를 보여줍니다.

제안 방법

기반 모델로 오픈 소스 코드 사전 학습 LLM(CodeLLaMA-7B)을 선택합니다.
fault localization 신호와 원래 버그 코드가 포함된 APR 특화 입력 및 출력 코드 표현을 설계합니다.
LoRA를 사용하여 미세조정을 가볍게 유지(약 4M 매개변수) 하면서 프로그램 수리를 위해 LLM을 바꿉니다.
fine-tuning 데이터셋(Megadiff)을 큐레이션하고 길이 제약(≤1024 토큰)으로 여러 표현 쌍으로 처리합니다.
Defects4J v2 및 HumanEval-Java에서 가능성 있는 매칭, AST 매칭, 정확 매칭 지표와 비교를 위한 BASelines(인필링 프롬프트, 전체 미세조정)로 평가합니다.

Figure 1 . Overview of RepairLLaMA. The core novelties of RepairLLaMA are the APR specific code representations and the engineering of an effective program repair adapter that is plugged into the underlying LLM.

실험 결과

연구 질문

RQ1RQ1: 프로그램 수리를 위해 LLM을 미세조정하기에 가장 좋은 코드 표현은 무엇인가요?
RQ2RQ2: 파라미터 효율적 미세조정과 전체 매개변수 미세조정 중 APR에서 어떤 차이가 있나요?
RQ3RQ3: RepairLLaMA는 최첨단 ChatGPT 기반 APR과 어떤 차이가 있나요?

주요 결과

fault localization 신호가 있는 코드 표현은 순진한 표현보다 크게 우수합니다.
수리 특화 표현을 이용한 미세조정은 Defects4J v2와 HumanEval-Java에서 기준(미세조정 없음) 대비 상당한 이득을 제공합니다.
RepairLLaMA (IR4xOR2)는 최상의 결과를 달성하여 Defects4J v2에서 195개 버그를, HumanEval-Java에서 118개 버그를 수리하는 것으로 보이며 Defects4J v2에서 125개의 AST 매칭, 124개의 정확 매칭을 달성합니다.
LoRA를 통한 파라미터 효율적 미세조정은 이 APR 설정에서 전체 미세조정보다 우수하며, 여러 지표에서 전체 미세조정을 사용한 IR4xOR2보다 더 나은 성능을 보입니다.
약 4M 매개변수에 불과한 수리 어댑터는 기본 CodeLLaMA-7B보다 1600배 작지만 최첨단 수리 성능을 제공하고, 보고된 결과에서 GPT-4를 상회합니다.

Figure 2 . Buggy code of the multi-location bug Chart-5 represented in our four different input representations.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.