QUICK REVIEW

[论文解读] RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

André Silva, Sen Fang|arXiv (Cornell University)|Dec 25, 2023

Software Testing and Debugging Techniques被引用 16

一句话总结

RepairLLaMA 提出一种修复适配器方法，使用代码特定表示和 LoRA 基于的参数高效微调，在 Defects4J v2 和 HumanEval-Java 上取得了最先进的修复性能，包括多位置缺陷。

ABSTRACT

Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tune LLMs with naive code representations and does not scale to frontier models. To address this problem, we propose RepairLLaMA, a novel program repair approach that 1) identifies optimal code representations for APR with fine-tuned models, and 2) pioneers state-of-the-art parameter-efficient fine-tuning technique (PEFT) for program repair. This results in RepairLLaMA producing a highly effective `program repair adapter' for fixing bugs with AI. Our experiments demonstrate the validity of both concepts. First, fine-tuning adapters with program repair specific code representations enables the model to use meaningful repair signals and produce better patches. Second, parameter-efficient fine-tuning helps fine-tuning to converge and clearly contributes to the effectiveness of RepairLLaMA in fixing bugs outside the fine-tuning data distribution. Overall, RepairLLaMA correctly fixes 144 Defects4J v2, 109 HumanEval-Java, and 20 GitBug-Java bugs, outperforming all baselines.

研究动机与目标

通过利用领域特定的代码表示来推动改进的自动化程序修复（APR）。
研究带有定位故障信号的输入/输出表示如何影响修复性能。
在 APR 场景中评估参数高效微调（LoRA）与全量微调的对比。
展示将修复适配器插入到已预训练的大语言模型中进行 Java 缺陷修复的有效性。

提出的方法

选择一个开源的代码预训练大语言模型（CodeLLaMA-7B）作为基模型。
设计 APR 相关的输入输出代码表示，融入故障定位信号和原始错误代码。
使用 LoRA 训练修复适配器，使微调保持轻量化（约 4M 参数），同时使 LLM 适应程序修复。
整理微调数据集（Megadiff），并将其处理成多组表示对，长度约束为（≤1024 tokens）。
在 Defects4J v2 和 HumanEval-Java 上使用 plausible、AST-match 和 exact-match 指标评估多组表示对，并与基线方法（infilling prompt、全量微调）进行比较。

Figure 1 . Overview of RepairLLaMA. The core novelties of RepairLLaMA are the APR specific code representations and the engineering of an effective program repair adapter that is plugged into the underlying LLM.

实验结果

研究问题

RQ1RQ1：用于对 LLM 进行程序修复微调的最佳代码表示是什么？
RQ2RQ2：在程序修复中，参数高效微调与全参数微调相比如何？
RQ3RQ3：RepairLLaMA 相较于最先进的基于 ChatGPT 的 APR 表现如何？

主要发现

Code Representations	Defects4J v2 Plausible	Defects4J v2 AST Match	Defects4J v2 Exact Match	HumanEval-Java Plausible	HumanEval-Java AST Match	HumanEval-Java Exact Match
IR3 x OR2 (基线，未微调)	133	71	52	107	81	71
IR1 x OR1	79	31	29	78	54	52
IR1 x OR3	41	17	15	39	21	21
IR1 x OR4	12	2	2	5	2	2
IR2 x OR2	198	122	121	118	77	69
IR3 x OR2	154	87	84	103	68	63
IR4 x OR2 (RepairLLaMA)	195	125	124	118	82	75

带有故障定位信号的代码表示明显优于原始的朴素表示。
使用修复特定表示进行微调，在 Defects4J v2 和 HumanEval-Java 上相对于基线（无微调）取得显著提升。
RepairLLaMA (IR4xOR2) 取得最佳结果，可能修复 195 个 Defects4J v2 缺陷和 118 个 HumanEval-Java 缺陷，在 Defects4J v2 上实现 125 次 AST 匹配和 124 次精确匹配。
在此 APR 场景中，使用 LoRA 的参数高效微调优于全量微调（RepairLLaMA 在若干指标上优于 IR4xOR2 的全量微调）。
仅有 ~4M 参数的修复适配器比基准的 CodeLLaMA-7B 小 1600 倍，但仍实现了最先进的修复性能，在报道结果中超越 GPT-4。

Figure 2 . Buggy code of the multi-location bug Chart-5 represented in our four different input representations.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。