QUICK REVIEW

[논문 리뷰] Revisiting the Plastic Surgery Hypothesis via Large Language Models

Chunqiu Steven Xia, Yifeng Ding|arXiv (Cornell University)|2023. 03. 18.

Software Engineering Research인용 수 12

한 줄 요약

본 논문은 미세조정과 프롬프트를 통해 플라스틱 수술 가설을 활용하는 LLM 기반 자동 프로그램 수리(FitRepair) 접근법을 소개하고, Defects4j 1.2와 2.0에서 최첨단 수정 성능을 달성한다.

ABSTRACT

Automated Program Repair (APR) aspires to automatically generate patches for an input buggy program. Traditional APR tools typically focus on specific bug types and fixes through the use of templates, heuristics, and formal specifications. However, these techniques are limited in terms of the bug types and patch variety they can produce. As such, researchers have designed various learning-based APR tools with recent work focused on directly using Large Language Models (LLMs) for APR. While LLM-based APR tools are able to achieve state-of-the-art performance on many repair datasets, the LLMs used for direct repair are not fully aware of the project-specific information such as unique variable or method names. The plastic surgery hypothesis is a well-known insight for APR, which states that the code ingredients to fix the bug usually already exist within the same project. Traditional APR tools have largely leveraged the plastic surgery hypothesis by designing manual or heuristic-based approaches to exploit such existing code ingredients. However, as recent APR research starts focusing on LLM-based approaches, the plastic surgery hypothesis has been largely ignored. In this paper, we ask the following question: How useful is the plastic surgery hypothesis in the era of LLMs? Interestingly, LLM-based APR presents a unique opportunity to fully automate the plastic surgery hypothesis via fine-tuning and prompting. To this end, we propose FitRepair, which combines the direct usage of LLMs with two domain-specific fine-tuning strategies and one prompting strategy for more powerful APR. Our experiments on the widely studied Defects4j 1.2 and 2.0 datasets show that FitRepair fixes 89 and 44 bugs (substantially outperforming the best-performing baseline by 15 and 8), respectively, demonstrating a promising future of the plastic surgery hypothesis in the era of LLMs.

연구 동기 및 목표

APR를 위한 대형 언어 모델 시대에 플라스틱 수술 가설을 재조명한다.
프로젝트 특화 정보를 활용하여 LLM이 수리에서 방향을 제시하도록 하는 완전 자동 프레임워크를 개발한다.
패치 생성을 개선하기 위한 두 가지 도메인 특화 미세조정 전략과 프롬프트 기법을 제안한다.
Defects4j 1.2 및 2.0에서 효과를 입증하고 소거 연구를 통해 영향을 분석한다.

제안 방법

CodeT5(MSP 기반 인코더-디코더 LLM)에서 FitRepair를 구현한다.
프로젝트 특화 토큰을 학습하기 위해 공격적으로 50% 토큰 마스킹을 사용하는 Knowledge-Intensified 미세조정을 도입한다.
복구 작업에 맞추기 위해 샘플당 하나의 연속 코드 시퀀스를 마스킹하는 Repair-Oriented 미세조정을 도입한다.
정보 검색과 정적 분석을 활용하여 버그 관련 식별자를 모델에 제공하는 Relevant-Identifier prompting을 제안한다.
네 가지 모델 변형(기본 CodeT5, 두 개의 미세조정 모델, 프롬 prompting 버전)의 패치를 결합하고 가능도 순으로 순위를 매겨 테스트와 대조하여 그럴듯하고 올바른 패치를 선택한다.

실험 결과

연구 질문

RQ1RQ1: FitRepair가 Defects4j 1.2 및 2.0에서 최첨단 APR 도구와 어떻게 비교되는가?
RQ2RQ2: 다양한 FitRepair 구성(미세조정 전략과 prompting)이 수리 성능에 미치는 영향은?
RQ3RQ3: FitRepair가 서로 다른 프로젝트의 추가 버그 수정에 얼마나 잘 일반화되는가?

주요 결과

FitRepair는 Defects4j 1.2에서 89건의 버그를, 2.0에서 44건의 버그를 수정하여 각기 최상의 베이스라인보다 15건, 8건의 수정 우수.
광범위한 소거 연구가 설계 선택의 타당성을 정당화하고 미세조정과 프롬프트 전략의 결합 이점을 입증한다.
이 접근법은 LLM과 함께 플라스틱 수술 가설을 도입하면 APR을 상당히 향상시킬 수 있으며 완전 자동화되고 일반화 가능함을 보여준다.
프롬프트를 통해 제공된 부분적이거나 불완전한 프로젝트 특화 정보조차도 LLM이 올바른 패치를 생성하도록 효과적으로 안내할 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.