QUICK REVIEW

[论文解读] Learning to superoptimize programs

Rudy Bunel, Alban Desmaison|arXiv (Cornell University)|Nov 6, 2016

Software Engineering Research参考文献 17被引用 17

一句话总结

本文提出一种基于学习的代码超优化方法，通过强化学习学习自适应提议分布，改进了随机搜索。利用 REINFORCE 算法基于期望改进优化提议分布，该方法显著优于最先进的技术（如 Stoke），在 Hacker’s Delight 和自动生成的基准测试中，以更少的迭代次数实现了更优的优化质量。

ABSTRACT

Code super-optimization is the task of transforming any given program to a more efficient version while preserving its input-output behaviour. In some sense, it is similar to the paraphrase problem from natural language processing where the intention is to change the syntax of an utterance without changing its semantics. Code-optimization has been the subject of years of research that has resulted in the development of rule-based transformation strategies that are used by compilers. More recently, however, a class of stochastic search based methods have been shown to outperform these strategies. This approach involves repeated sampling of modifications to the program from a proposal distribution, which are accepted or rejected based on whether they preserve correctness, and the improvement they achieve. These methods, however, neither learn from past behaviour nor do they try to leverage the semantics of the program under consideration. Motivated by this observation, we present a novel learning based approach for code super-optimization. Intuitively, our method works by learning the proposal distribution using unbiased estimators of the gradient of the expected improvement. Experiments on benchmarks comprising of automatically generated as well as existing ("Hacker's Delight") programs show that the proposed method is able to significantly outperform state of the art approaches for code super-optimization.

研究动机与目标

解决现有随机代码超优化中固定、非自适应提议分布的局限性，例如 Stoke 框架中的问题。
通过学习一种能适应输入程序语义和结构的提议分布，提升超优化的效率与质量。
证明所学习的提议分布可在更短时间内实现优于均匀分布或基于规则的提议策略的优化结果。
在多样化基准测试上评估该方法，包括来自《Hacker’s Delight》的手动整理程序和具有更高结构多样性的自动生成程序。

提出的方法

将超优化建模为强化学习问题，目标是学习一种能最大化程序效率期望改进的提议分布。
使用 REINFORCE 算法估计相对于提议分布参数的期望改进梯度，从而实现端到端学习。
提议分布建模为神经网络（或简单偏置），并基于程序特征进行条件化，使其能适应输入程序的语法和语义结构。
采用马尔可夫链蒙特卡洛（MCMC）采样过程，根据改进程度和正确性决定是否接受所提议的程序变换。
训练数据包括输入程序及其对应的优化轨迹，支持监督预训练或通过重复 MCMC 采样实现自监督学习。
使用衡量程序效率的成本函数评估性能，通过与基线方法相比的相对得分提升来跟踪优化效果。

实验结果

研究问题

RQ1所学习的提议分布是否能在随机代码超优化中优于固定、非自适应的提议分布？
RQ2将提议分布基于程序特征进行条件化，是否能实现更快的收敛速度和更高质量的优化？
RQ3该方法在多样化程序基准测试上的性能与最先进的超优化器（如 Stoke）相比如何？
RQ4该方法能否在不同类型的程序上泛化，包括结构多样性较低和结构变化较大的程序？

主要发现

在 Hacker’s Delight 基准测试中，简单的无条件偏置模型优于 Stoke 中使用的均匀提议分布，平均相对得分为 63.56%，而基线为 78.15%。
在更复杂的自动生成基准测试中，基于程序特征条件化的多层感知机（MLP）模型实现了 62.27% 的平均相对得分，显著优于 78.15% 的基线。
仅用 100 次迭代，所学习的提议分布即实现了优于均匀提议分布 400 次迭代的性能，证明了更快的收敛速度。
所学习的提议分布在多次优化运行中，比均匀基线更稳健、更一致地降低了平均程序成本。
使用所学习提议分布时，吞吐量达到每秒 20,000 次操作，而均匀基线为每秒 60,000 次，表明在速度与质量之间实现了合理权衡。
结果表明，学习提议分布是可行且有效的，尤其当模型基于程序结构进行条件化时，能带来更优越的优化结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。