QUICK REVIEW

[論文レビュー] Dynamic Neural Program Embedding for Program Repair

Ke Wang, Rishabh Singh|arXiv (Cornell University)|Nov 20, 2017

Software Engineering Research参考文献 12被引用数 54

ひとこと要約

論文は、実行時のトレースから学習した意味的ダイナミックプログラム埋め込みを提案し、エラー分類や修復効率を含むプログラム解析タスクを改善し、構文ベースの埋め込みを上回る。

ABSTRACT

Neural program embeddings have shown much promise recently for a variety of program analysis tasks, including program synthesis, program repair, fault localization, etc. However, most existing program embeddings are based on syntactic features of programs, such as raw token sequences or abstract syntax trees. Unlike images and text, a program has an unambiguous semantic meaning that can be difficult to capture by only considering its syntax (i.e. syntactically similar pro- grams can exhibit vastly different run-time behavior), which makes syntax-based program embeddings fundamentally limited. This paper proposes a novel semantic program embedding that is learned from program execution traces. Our key insight is that program states expressed as sequential tuples of live variable values not only captures program semantics more precisely, but also offer a more natural fit for Recurrent Neural Networks to model. We evaluate different syntactic and semantic program embeddings on predicting the types of errors that students make in their submissions to an introductory programming class and two exercises on the CodeHunt education platform. Evaluation results show that our new semantic program embedding significantly outperforms the syntactic program embeddings based on token sequences and abstract syntax trees. In addition, we augment a search-based program repair system with the predictions obtained from our se- mantic embedding, and show that search efficiency is also significantly improved.

研究の動機と目的

構文ベースのプログラム表現が意味を捉える上での根本的な限界を示す。
プログラム実行トレース（変数トレース、状態トレース、依存性強制を組み込んだ）から学習される動的プログラム埋め込みを導入し、意味を捉える。
プログラミング課題における一般的な学生のエラー傾向の予測に埋め込みを評価する。
意味予測に基づいて導かれる場合、動的埋め込みが検索ベースのプログラム修復の効率を向上させることを示す。

提案手法

プログラム実行を動的トレースとして表現する：変数トレース、状態トレース、および依存性強制を組み合わせたハイブリッド。
GRUベースのRNNを用いてトレースをエンコードし、プーリングを介して埋め込みを生成する。
変数トレース間でデータ/制御依存性を強制して、プログラムの意味をよりよく捉える。
エラーパターンの予測と、ガイド付き修復のためのSarfGenへの統合を目的として埋め込みを訓練・評価する。
動的埋め込みを文法ベースの基準と比較する：実行時トレース、トークン、およびASTエンコーダ。

実験結果

リサーチクエスチョン

RQ1プログラム実行トレースから学習した動的埋め込みは、エラーパターン予測において構文ベースの埋め込みより優れているだろうか？
RQ2動的埋め込みは、修正を優先する際に検索ベースのプログラム修復システムの効率を向上させるだろうか？
RQ3どの埋め込み戦略（変数トレース、状態トレース、または依存性強制）が修復タスクの意味を最もよく捉えるのか？
RQ4動的埋め込みは異なるプログラミング問題とデータセット全体でどの程度一般化するのか？

主な発見

プログラミング問題	変数トレース	状態トレース	依存性強制	実行時構文トレース	トークン	AST
Print Chessboard	93.9%	95.3%	99.3%	26.3%	16.8%	16.2%
Count Parentheses	92.7%	93.8%	98.8%	25.5%	19.3%	21.7%
Generate Binary Digits	92.1%	94.5%	99.2%	23.8%	21.2%	20.9%

動的埋め込みは、一般的な学生のエラー傾向の予測において構文ベースの埋め込みを著しく上回り（正確度>92%対<27%）。
依存性強制埋め込みは、エンコード時に変数依存性を組み込むことで意味認識性の高い表現を生み出す。
動的埋め込みをSarfGenの指針として用いると、修復時間が大幅に短縮され、修正数が増えるにつれて特に顕著である。
構文ベースのトレース（実行時）は、動的トレースに比べて性能が低く、構文表現の意味的ギャップを浮き彫りにする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。