QUICK REVIEW

[论文解读] De-Hallucinator: Mitigating LLM Hallucinations in Code Generation Tasks via Iterative Grounding

Aryaz Eghbali, Michael Pradel|arXiv (Cornell University)|Jan 3, 2024

Software Engineering Research被引用 8

一句话总结

De-Hallucinator 通过以项目特定 API 引用对预测进行锚定并通过迭代的上下文增强来减轻代码补全中的大型语言模型幻觉。

ABSTRACT

Large language models (LLMs) trained on datasets of publicly available source code have established a new state of the art in code generation tasks. However, these models are mostly unaware of the code that exists within a specific project, preventing the models from making good use of existing APIs. Instead, LLMs often invent, or "hallucinate", non-existent APIs or produce variants of already existing code. This paper presents De-Hallucinator, a technique that grounds the predictions of an LLM through a novel combination of retrieving suitable API references and iteratively querying the model with increasingly suitable context information in the prompt. The approach exploits the observation that predictions by LLMs often resemble the desired code, but they fail to correctly refer to already existing APIs. De-Hallucinator automatically identifies project-specific API references related to the model's initial predictions and adds these references into the prompt. Unlike retrieval-augmented generation (RAG), our approach uses the initial prediction(s) by the model to iteratively retrieve increasingly suitable API references. Our evaluation applies the approach to two tasks: predicting API usages in Python and generating tests in JavaScript. We show that De-Hallucinator consistently improves the generated code across five LLMs. In particular, the approach improves the edit distance by 23.3-50.6% and the recall of correctly predicted API usages by 23.9-61.0% for code completion, and improves the number of fixed tests that initially failed because of hallucinations by 63.2%, resulting in a 15.5% increase in statement coverage for test generation.

研究动机与目标

阐明在面向特定项目的代码补全中出现 API 幻觉的问题。
提出一种基于定位/锚定的方法，将 LLM 的预测与目标项目的 API 引用锚定。
开发一种迭代提示策略，利用模型输出来检索额外上下文。
证明在无需重新训练模型的情况下，锚定方法可以在多种 LLM 上提升 API 使用预测。

提出的方法

定义一个具有逐步提高上下文质量的检索增强提示管线。
使用 CodeQL 和基于嵌入的最近邻检索对项目 API 引用进行索引。
通过将 API 引用前置到提示中来构造增强提示。
用更新后的提示对 LLM 进行迭代查询，直到达到固定点或达到最大迭代次数。
对补全结果进行后处理，确保句法正确并聚焦于 API 使用。

实验结果

研究问题

RQ1RQ1: 相比默认提示，De-Hallucinator 在代码补全方面提升多少？
RQ2RQ2: De-Hallucinator 将正确的 API 引用添加到提示中的效果如何？
RQ3RQ3: 超参数如何影响补全结果？
RQ4RQ4: De-Hallucinator 的效率如何，每一步对运行时的贡献是多少？

主要发现

De-Hallucinator 在四个用于代码的前沿 LLM 上实现了一致性的提升：CodeGen、CodeGen 2.5、UniXcoder 和 StarCoder+。
编辑距离改进：相对基线提升 23.28%–50.64%。
归一化编辑相似度改进：相对基线提升 12.12%–27.48%。
正确预测的 API 使用的召回率提升：相对基线 23.90%–60.98%。
将项目特定 API 锚定到目标代码库上可降低幻觉或不存在的 API 使用。

Figure 9. Relative improvements over the baseline for the maximum number of iterations, $k$ .

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。