QUICK REVIEW

[论文解读] Break-It-Fix-It: Unsupervised Learning for Program Repair

Michihiro Yasunaga, Percy Liang|arXiv (Cornell University)|Jun 11, 2021

Domain Adaptation and Few-Shot Learning参考文献 78被引用 28

一句话总结

Break-It-Fix-It (BIFI) 在无监督学习中通过交替修复真实坏代码和生成现实坏代码来学习代码修复器，并由评审者引导。它在 GitHub-Python 和 DeepFix 上实现了无标签数据的新一代最先进修复准确率。

ABSTRACT

We consider repair tasks: given a critic (e.g., compiler) that assesses the quality of an input, the goal is to train a fixer that converts a bad example (e.g., code with syntax errors) into a good one (e.g., code with no syntax errors). Existing works create training data consisting of (bad, good) pairs by corrupting good examples using heuristics (e.g., dropping tokens). However, fixers trained on this synthetically-generated data do not extrapolate well to the real distribution of bad inputs. To bridge this gap, we propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas: (i) we use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data, and (ii) we train a breaker to generate realistic bad code from good code. Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data. We evaluate BIFI on two code repair datasets: GitHub-Python, a new dataset we introduce where the goal is to repair Python code with AST parse errors; and DeepFix, where the goal is to repair C code with compiler errors. BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python (+28.5%) and 71.7% on DeepFix (+5.6%). Notably, BIFI does not require any labeled data; we hope it will be a strong starting point for unsupervised learning of various repair tasks.

研究动机与目标

从未标注的数据中通过评审者来判断修复质量来学习一个代码修复器的动机。
弥合合成训练数据与真实世界代码错误之间的分布差距。
开发一个循环框架，使断路器和修复器相互改进，以生成真实配对数据。

提出的方法

以合成的坏/好对初始化，并训练初始修复器和断路器。
迭代地对真实坏代码应用修复器，并让输出经评审者确认为已修复（真实配对数据）。
在真实配对数据上训练断路器，以在好代码上生成真实错误。
对好代码应用断路器，并保留那些实际被断裂（经评审者验证）的输出。
用真实配对数据与断路器生成的对的混合来重新训练修复器，使之与真实错误分布对齐。

实验结果

研究问题

RQ1无监督框架是否能够从未标注的代码和质量评审者中学习一个代码修复器？
RQ2对抗性式地生成真实错误是否能提升修复性能，相较于仅使用合成数据？
RQ3BIFI 与无监督代码修复中的回译（backtranslation）相比如何？

主要发现

方法	总计	不平衡	括号	缩进
Initial Round-0	62.0%	87.7%	39.4%	70.5%
FixerOnly Round-1	86.8%	93.3%	79.5%	90.9%
FixerOnly Round-2	88.6%	92.4%	83.7%	92.0%
BIFI Round-1	88.0%	94.1%	81.3%	91.6%
BIFI Round-2	90.5%	94.2%	85.9%	93.5%

BIFI 在 GitHub-Python 上超过初始使用合成数据训练的修复器，达到 90.5% 的修复准确率（Round-2），起始为 62.0%。
BIFI 在 DeepFix 上达到新的最先进水平，修复准确率为 71.7%（Round-2），起始自 DrRepair 的 66.1%。
使用真实坏代码和断路器生成的坏代码比单独使用合成数据能显著提升修复准确率（FixerOnly 与 BIFI 的对比）。
在没有评审者和真实坏数据的情况下的回译不及 BIFI，BIFI 在 GitHub-Python 上超过回译约 10%。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。