QUICK REVIEW

[論文レビュー] Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions

M. Helena Vasconcelos, Gagan Bansal|arXiv (Cornell University)|Feb 14, 2023

Software Engineering Research被引用数 17

ひとこと要約

この論文はAI支援コード補完の不確実性ハイライトの2手法を比較し、編集される可能性の高いトークンをハイライトする（生成確率ではなく）方が作業を速くし、より的を絞った修正を生む一方、生成確率ハイライトは明確な利点を示さない。

ABSTRACT

Large-scale generative models enabled the development of AI-powered code completion tools to assist programmers in writing code. However, much like other AI-powered tools, AI-powered code completions are not always accurate, potentially introducing bugs or even security vulnerabilities into code if not properly detected and corrected by a human programmer. One technique that has been proposed and implemented to help programmers identify potential errors is to highlight uncertain tokens. However, there have been no empirical studies exploring the effectiveness of this technique -- nor investigating the different and not-yet-agreed-upon notions of uncertainty in the context of generative models. We explore the question of whether conveying information about uncertainty enables programmers to more quickly and accurately produce code when collaborating with an AI-powered code completion tool, and if so, what measure of uncertainty best fits programmers' needs. Through a mixed-methods study with 30 programmers, we compare three conditions: providing the AI system's code completion alone, highlighting tokens with the lowest likelihood of being generated by the underlying generative model, and highlighting tokens with the highest predicted likelihood of being edited by a programmer. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits, and is subjectively preferred by study participants. In contrast, highlighting tokens according to their probability of being generated does not provide any benefit over the baseline with no highlighting. We further explore the design space of how to convey uncertainty in AI-powered code completion tools, and find that programmers prefer highlights that are granular, informative, interpretable, and not overwhelming.

研究の動機と目的

AI搭載コード補完を使用する際の不確実性ハイライトがプログラマーの性能に与える影響を理解する。
複数のコーディング課題において、生成確率と編集可能性（編集確率）の2つの不確実性概念を比較する。
プログラマーに不確実性をどのように伝えるべきかのデザイン好みを特定する。
異なる不確実性ハイライト方式に関連する主観的有用性と認知負荷を調査する。

提案手法

30名のプログラマーを対象に、同一被験者内設計の混合研究法を用いた実験を行い、3つのコーディング課題を遂行する。
2つの不確実性ハイライト条件を実装: 生成確率ベースのハイライト（閾値は約69.4%）と、編集モデルに基づくハイライト（6名の参加者のうち少なくとも4名が編集したトークン）。
Codex生成補完への参加者の編集に基づき、トークンが編集される可能性を予測するクローズドワールドの編集モデルを訓練する。
ハイライトなしのベースラインと比較し、複数の性能指標と主観的指標を評価する。
時間、正確性、トークンの生存、認知的負荷、知覚される有用性に関する9つの仮説を事前登録し分析する。

実験結果

リサーチクエスチョン

RQ1不確実性をハイライトすることは、AI支援コード作成におけるタスク完了時間と正確性を改善するか？
RQ2どの不確実性の概念（生成確率 vs. 編集確率）がプログラマーにとってより有益か？
RQ3コード補完における不確実性ハイライトに関するプログラマーのデザイン好みは何か？

主な発見

編集される可能性が最も高いと予測されたトークンをハイライトすると、タスク完了が速くなり、より的を絞った修正が得られる。
生成確率に基づくトークンのハイライトは、ハイライトなしのベースラインと比べて性能上の利益をもたらさない。
編集モデルのハイライトは、参加者がハイライトされたトークンを編集する可能性を高め、主観的にも好まれる。
プログラマーは、粒度が細かく、情報量が多く、解釈可能で、圧倒されない不確実性ハイライトを好み、正確な確率より陰影を好む。
編集モデルのハイライトでタスクを跨いだ正確性が向上するという証拠があるが、サンプルサイズのため必ずしも統計的に有意ではない場合がある。
プログラム編集データで訓練された単純なクローズドワールド編集モデルは、コード生成における不確実性の有効な探査手段となり得る。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。