QUICK REVIEW

[論文レビュー] De-Hallucinator: Mitigating LLM Hallucinations in Code Generation Tasks via Iterative Grounding

Aryaz Eghbali, Michael Pradel|arXiv (Cornell University)|Jan 3, 2024

Software Engineering Research被引用数 8

ひとこと要約

De-Hallucinatorは、プロジェクト固有のAPIリファレンスで予測を根拠づけ、反復的な文脈拡張によりコード補完におけるLLMの幻覚を抑制します。

ABSTRACT

Large language models (LLMs) trained on datasets of publicly available source code have established a new state of the art in code generation tasks. However, these models are mostly unaware of the code that exists within a specific project, preventing the models from making good use of existing APIs. Instead, LLMs often invent, or "hallucinate", non-existent APIs or produce variants of already existing code. This paper presents De-Hallucinator, a technique that grounds the predictions of an LLM through a novel combination of retrieving suitable API references and iteratively querying the model with increasingly suitable context information in the prompt. The approach exploits the observation that predictions by LLMs often resemble the desired code, but they fail to correctly refer to already existing APIs. De-Hallucinator automatically identifies project-specific API references related to the model's initial predictions and adds these references into the prompt. Unlike retrieval-augmented generation (RAG), our approach uses the initial prediction(s) by the model to iteratively retrieve increasingly suitable API references. Our evaluation applies the approach to two tasks: predicting API usages in Python and generating tests in JavaScript. We show that De-Hallucinator consistently improves the generated code across five LLMs. In particular, the approach improves the edit distance by 23.3-50.6% and the recall of correctly predicted API usages by 23.9-61.0% for code completion, and improves the number of fixed tests that initially failed because of hallucinations by 63.2%, resulting in a 15.5% increase in statement coverage for test generation.

研究の動機と目的

プロジェクト固有のコード補完におけるAPI幻覚の問題を動機づける。
ターゲットプロジェクトのAPIリファレンスを用いてLLMの予測を根拠づけるグラウンディングに基づく手法を提案する。
モデル出力を用いて追加の文脈を取得する反復的なプロンプト戦略を開発する。
モデルの再学習を行わずに、グラウンディングが複数のLLMにおけるAPI使用予測を改善することを示す。

提案手法

文脈品質を段階的に高める検索拡張型プロンプトパイプラインを定義する。
CodeQLを用いた埋め込みベースの最近傍検索でプロジェクトのAPIリファレンスをインデックス化する。
APIリファレンスをプロンプトの先頭に付加して拡張プロンプトを構築する。
更新されたプロンプトを用いてLLMを反復的に照会し、固定点または最大反復回数に達するまで繰り返す。
補完の後処理を行い、文法的正確さを保証し、API使用に焦点を当てる。

実験結果

リサーチクエスチョン

RQ1RQ1: De-Hallucinatorはデフォルトのプロンプトと比較してコード補完をどれくらい改善しますか？
RQ2RQ2: De-Hallucinatorはプロンプトに正しいAPIリファレンスを追加する上でどの程度効果的ですか？
RQ3RQ3: ハイパーパラメータは補完にどのような影響を与えますか？
RQ4RQ4: De-Hallucinatorの効率性はどの程度で、各ステップが実行時間に与える寄与はどのくらいですか？

主な発見

De-Hallucinatorは、コード用の最先端LLMであるCodeGen、CodeGen 2.5、UniXcoder、StarCoder+の4つに対して一貫した改善をもたらします。
編集距離の改善: ベースラインに対して23.28%–50.64%。
正規化編集類似度の改善: ベースラインに対して12.12%–27.48%。
正しく予測されたAPI使用のリコールの改善: ベースラインに対して23.90%–60.98%。
プロジェクト固有のAPI根拠付けは、幻覚や存在しないAPI使用をターゲットコードベースへ予測を根拠づけることで減らす。

Figure 9. Relative improvements over the baseline for the maximum number of iterations, $k$ .

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。