QUICK REVIEW

[論文レビュー] Revisiting the Plastic Surgery Hypothesis via Large Language Models

Chunqiu Steven Xia, Yifeng Ding|arXiv (Cornell University)|Mar 18, 2023

Software Engineering Research被引用数 12

ひとこと要約

本論文は FitRepair を紹介する。LLMベースの自動プログラム修復アプローチで、ファインチューニングと prompting を通じて plastic surgery hypothesis を活用し、Defects4j 1.2 および 2.0 で最先端の修正を達成している。

ABSTRACT

Automated Program Repair (APR) aspires to automatically generate patches for an input buggy program. Traditional APR tools typically focus on specific bug types and fixes through the use of templates, heuristics, and formal specifications. However, these techniques are limited in terms of the bug types and patch variety they can produce. As such, researchers have designed various learning-based APR tools with recent work focused on directly using Large Language Models (LLMs) for APR. While LLM-based APR tools are able to achieve state-of-the-art performance on many repair datasets, the LLMs used for direct repair are not fully aware of the project-specific information such as unique variable or method names. The plastic surgery hypothesis is a well-known insight for APR, which states that the code ingredients to fix the bug usually already exist within the same project. Traditional APR tools have largely leveraged the plastic surgery hypothesis by designing manual or heuristic-based approaches to exploit such existing code ingredients. However, as recent APR research starts focusing on LLM-based approaches, the plastic surgery hypothesis has been largely ignored. In this paper, we ask the following question: How useful is the plastic surgery hypothesis in the era of LLMs? Interestingly, LLM-based APR presents a unique opportunity to fully automate the plastic surgery hypothesis via fine-tuning and prompting. To this end, we propose FitRepair, which combines the direct usage of LLMs with two domain-specific fine-tuning strategies and one prompting strategy for more powerful APR. Our experiments on the widely studied Defects4j 1.2 and 2.0 datasets show that FitRepair fixes 89 and 44 bugs (substantially outperforming the best-performing baseline by 15 and 8), respectively, demonstrating a promising future of the plastic surgery hypothesis in the era of LLMs.

研究の動機と目的

APR の時代における Large Language Models での plastic surgery hypothesis を再検討する。
特定のプロジェクト情報を活用して LLM を修復へ導く、完全自動化されたフレームワークを開発する。
パッチ生成を改善するための 2 つのドメイン固有のファインチューニング戦略と prompting 手法を提案する。
Defects4j 1.2 および 2.0 での有効性を示し、アブレーション研究によって影響を分析する。

提案手法

CodeT5 (MSP ベースのエンコーダ-デコーダ LLM) に FitRepair を実装する。
プロジェクト固有のトークンを学習するため、攻撃的な 50% トークンマスキングを用いた Knowledge-Intensified fine-tuning を導入する。
修復タスクに合わせるため、各サンプルで単一の連続したコードシーケンスをマスクする Repair-Oriented fine-tuning を導入する。
情報検索と静的解析を用いてバグ関連の識別子をモデルへ供給する Relevant-Identifier prompting を提案する。
4 つのモデルバリアント（ベース CodeT5、2 つのファインチューニングモデル、 prompting バージョン）からパッチを結合し、尤度でランク付けし、テストに対して検証して妥当で正しいパッチを選択する。

実験結果

リサーチクエスチョン

RQ1RQ1: FitRepair は Defects4j 1.2 および 2.0 で最先端の APR ツールとどのように比較されるか？
RQ2RQ2: 異なる FitRepair の設定（ファインチューニング戦略と prompting）が修復性能に与える影響は？
RQ3RQ3: FitRepair は異なるプロジェクトからの追加バグ修正にどれだけ generalize できるか？

主な発見

FitRepair は Defects4j 1.2 で 89 のバグ、2.0 で 44 のバグを修正し、ベストなベースラインよりそれぞれ 15 件および 8 件多く修正した。
大規模なアブレーション研究により設計選択を正当化し、ファインチューニング戦略と prompting 戦略を組み合わせる利点を示す。
本手法は、LLMs で plastic surgery hypothesis を組み込むことが APR を大幅に向上させ、完全自動かつ一般化可能であることを示している。
プロンプトを介して提供される部分的または不正確なプロジェクト固有情報でも、LLMs を効果的に正しいパッチ生成へ導くことができる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。