QUICK REVIEW

[논문 리뷰] De-Hallucinator: Mitigating LLM Hallucinations in Code Generation Tasks via Iterative Grounding

Aryaz Eghbali, Michael Pradel|arXiv (Cornell University)|2024. 01. 03.

Software Engineering Research인용 수 8

한 줄 요약

De-Hallucinator는 프로젝트 특정 API 레퍼런스로 예측을 근거화하고 반복적 맥락 보강으로 코드 자동완성에서 LLM의 환각을 줄인다.

ABSTRACT

Large language models (LLMs) trained on datasets of publicly available source code have established a new state of the art in code generation tasks. However, these models are mostly unaware of the code that exists within a specific project, preventing the models from making good use of existing APIs. Instead, LLMs often invent, or "hallucinate", non-existent APIs or produce variants of already existing code. This paper presents De-Hallucinator, a technique that grounds the predictions of an LLM through a novel combination of retrieving suitable API references and iteratively querying the model with increasingly suitable context information in the prompt. The approach exploits the observation that predictions by LLMs often resemble the desired code, but they fail to correctly refer to already existing APIs. De-Hallucinator automatically identifies project-specific API references related to the model's initial predictions and adds these references into the prompt. Unlike retrieval-augmented generation (RAG), our approach uses the initial prediction(s) by the model to iteratively retrieve increasingly suitable API references. Our evaluation applies the approach to two tasks: predicting API usages in Python and generating tests in JavaScript. We show that De-Hallucinator consistently improves the generated code across five LLMs. In particular, the approach improves the edit distance by 23.3-50.6% and the recall of correctly predicted API usages by 23.9-61.0% for code completion, and improves the number of fixed tests that initially failed because of hallucinations by 63.2%, resulting in a 15.5% increase in statement coverage for test generation.

연구 동기 및 목표

프로젝트 특정 코드 자동완성에서 API 환각 문제의 필요성을 제시한다.
타깃 프로젝트의 API 참조로 LLM 예측을 근거화하는 방법을 제안한다.
모델 출력물을 활용해 추가 맥락을 검색하는 반복적 프롬프트 전략을 개발한다.
모델 재학습 없이도 근거화가 여러 LLM에서 API 사용 예측을 향상시킨다는 것을 입증한다.

제안 방법

맥락 품질이 점진적으로 증가하는 검색 강화 프롬프트 파이프라인을 정의한다.
CodeQL 및 임베딩 기반 최근접 이웃 검색을 사용해 프로젝트 API 참조를 인덱싱한다.
프롬프트 앞에 API 참조를 접두사로 붙여 증강 프롬프트를 구성한다.
고정점 또는 최대 반복까지 업데이트된 프롬프트로 LLM을 반복적으로 질의한다.
구문적 정확성을 보장하고 API 사용에 집중하도록 완성도를 후처리한다.

실험 결과

연구 질문

RQ1RQ1: 기본 프롬프트와 비교할 때 De-Hallucinator가 코드 자동완성을 얼마나 개선하는가?
RQ2RQ2: 프롬프트에 올바른 API 참조를 추가하는 데 De-Hallucinator의 효과는 어느 정도인가?
RQ3RQ3: 하이퍼파라미터가 자동완성에 어떤 영향을 미치는가?
RQ4RQ4: De-Hallucinator의 효율성은 어떠하며 각 단계가 런타임에 어느 기여를 하는가?

주요 결과

De-Hallucinator는 코드용 4개의 최첨단 LLM: CodeGen, CodeGen 2.5, UniXcoder, StarCoder+에서 일관된 개선을 보여준다.
편집 거리 개선: 기준선 대비 23.28%–50.64%.
정규화된 편집 유사도 개선: 기준선 대비 12.12%–27.48%.
정확히 예측된 API 사용의 재현률 개선: 기준선 대비 23.90%–60.98%.
프로젝트 특정 API 근거화는 예측을 대상 코드베이스에 근거시키는 방식으로 환각된 또는 존재하지 않는 API 사용을 줄인다.

Figure 9. Relative improvements over the baseline for the maximum number of iterations, $k$ .

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.