QUICK REVIEW

[论文解读] Representing Partial Programs with Blended Abstract Semantics

Maxwell Nye, Yewen Pu|arXiv (Cornell University)|May 3, 2021

Software Engineering Research参考文献 30被引用 5

一句话总结

本文提出了一种神经符号方法，通过使用可学习的模块化神经网络来表示部分程序，该网络将具体执行状态与近似语义相结合，从而实现更有效的程序合成。与纯神经方法相比，该方法在固定搜索预算下，能更有效地合成包含循环和高阶函数等复杂结构的程序。

ABSTRACT

Synthesizing programs from examples requires searching over a vast, combinatorial space of possible programs. In this search process, a key challenge is representing the behavior of a partially written program before it can be executed, to judge if it is on the right track and predict where to search next. We introduce a general technique for representing partially written programs in a program synthesis engine. We take inspiration from the technique of abstract interpretation, in which an approximate execution model is used to determine if an unfinished program will eventually satisfy a goal specification. Here we \emph{learn} an approximate execution model implemented as a modular neural network. By constructing compositional program representations that implicitly encode the interpretation semantics of the underlying programming language, we can represent partial programs using a flexible combination of concrete execution state and learned neural representations, using the learned approximate semantics when concrete semantics are not known (in unfinished parts of the program). We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs, such as loops and higher-order functions, and can be used to synthesize programs more accurately for a given search budget than pure neural approaches in several domains.

研究动机与目标

解决在无法完整执行的情况下，程序合成过程中部分编写程序的表示问题。
通过更准确地预测有希望的程序路径，提升程序合成中的搜索效率。
在合成引擎中支持更丰富的语言结构，如循环和高阶函数。
将符号执行与学习到的语义相结合，实现更准确且可扩展的合成。

提出的方法

利用抽象解释原理，为不完整程序构建近似执行模型。
训练模块化神经网络，以学习编程语言构造的语义，特别是在不完整或不可执行代码段中的语义。
构建混合程序表示，将具体执行状态与未知语义处的学习到的神经近似相结合。
使用组合式程序表示，隐式编码语言语义，从而实现对程序结构的泛化。
将学习到的语义集成到执行引导的合成引擎中，以引导搜索朝向正确、有效的程序。
使合成引擎能够在包含未绑定变量或不完整控制流的程序中，依然能对部分程序进行推理。

实验结果

研究问题

RQ1学习到的神经模型能否以有效方式近似不完整程序的语义，从而支持准确的合成？
RQ2与纯神经方法或符号方法相比，将具体执行与学习到的语义相结合，如何提升合成性能？
RQ3这种混合表示在多大程度上能够支持包含循环和高阶函数等复杂结构的程序合成？
RQ4在计算资源受限的条件下，该方法是否能保持或提升搜索效率与准确性？

主要发现

混合神经符号表示使合成引擎能够有效推理原本无法执行的部分程序。
该方法比纯神经方法更可靠地支持循环和高阶函数等高级语言构造的使用。
在给定的搜索预算下，该方法在多个领域中均实现了比纯神经方法更高的程序合成准确率。
学习到的神经组件在未绑定或不完整的代码区域提供了有效的语义近似，从而改善了搜索引导。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。