QUICK REVIEW

[Paper Review] Learning to superoptimize programs

Rudy Bunel, Alban Desmaison|arXiv (Cornell University)|Nov 6, 2016

Software Engineering Research17 references17 citations

TL;DR

This paper proposes a learning-based approach to code super-optimization that improves stochastic search by learning an adaptive proposal distribution using reinforcement learning. By leveraging the REINFORCE algorithm to optimize the proposal distribution based on expected improvement, the method significantly outperforms state-of-the-art techniques like Stoke, achieving better optimization quality in fewer iterations on both Hacker’s Delight and automatically generated benchmarks.

ABSTRACT

Code super-optimization is the task of transforming any given program to a more efficient version while preserving its input-output behaviour. In some sense, it is similar to the paraphrase problem from natural language processing where the intention is to change the syntax of an utterance without changing its semantics. Code-optimization has been the subject of years of research that has resulted in the development of rule-based transformation strategies that are used by compilers. More recently, however, a class of stochastic search based methods have been shown to outperform these strategies. This approach involves repeated sampling of modifications to the program from a proposal distribution, which are accepted or rejected based on whether they preserve correctness, and the improvement they achieve. These methods, however, neither learn from past behaviour nor do they try to leverage the semantics of the program under consideration. Motivated by this observation, we present a novel learning based approach for code super-optimization. Intuitively, our method works by learning the proposal distribution using unbiased estimators of the gradient of the expected improvement. Experiments on benchmarks comprising of automatically generated as well as existing ("Hacker's Delight") programs show that the proposed method is able to significantly outperform state of the art approaches for code super-optimization.

Motivation & Objective

To address the limitation of fixed, non-adaptive proposal distributions in stochastic code super-optimization, such as in the Stoke framework.
To improve the efficiency and quality of super-optimization by learning a proposal distribution that adapts to the semantics and structure of the input program.
To demonstrate that a learned proposal distribution can achieve better optimization results faster than uniform or rule-based proposal strategies.
To evaluate the method on diverse benchmarks, including manually curated programs from 'Hacker’s Delight' and automatically generated programs with higher structural diversity.

Proposed method

The method formulates super-optimization as a reinforcement learning problem, where the goal is to learn a proposal distribution that maximizes expected improvement in program efficiency.
It uses the REINFORCE algorithm to estimate gradients of the expected improvement with respect to the proposal distribution parameters, enabling end-to-end learning.
The proposal distribution is modeled as a neural network (or simple bias) conditioned on program features, allowing it to adapt to the syntactic and semantic structure of the input program.
The approach employs a Markov Chain Monte Carlo (MCMC) sampling procedure where proposed program transformations are accepted or rejected based on improvement and correctness.
Training data consists of input programs and their corresponding optimization traces, enabling supervised pre-training or self-supervised learning via repeated MCMC sampling.
The method is evaluated using a cost function that measures program efficiency, with performance tracked via relative score improvements over baseline methods.

Experimental results

Research questions

RQ1Can a learned proposal distribution outperform fixed, non-adaptive proposal distributions in stochastic code super-optimization?
RQ2Does conditioning the proposal distribution on program features lead to faster convergence and higher-quality optimizations?
RQ3How does the performance of the learned method compare to state-of-the-art super-optimizers like Stoke on diverse program benchmarks?
RQ4Can the method generalize across different program types, including those with limited structural diversity and those with high structural variation?

Key findings

On the Hacker’s Delight benchmark, a simple unconditioned bias model outperformed the uniform proposal distribution used in Stoke, achieving a 63.56% average relative score compared to 78.15% for the baseline.
On the more complex, automatically generated benchmark, the multi-layer perceptron (MLP) conditioned on program features achieved a 62.27% average relative score, significantly better than the 78.15% baseline.
With only 100 iterations, the learned proposal distribution achieved better results than the uniform proposal with 400 iterations, demonstrating faster convergence.
The learned proposal distribution reduced the average program cost more robustly and consistently than the uniform baseline across multiple optimization runs.
The method achieved a 20,000 operations/second throughput with the learned proposal, compared to 60,000 for the uniform baseline, indicating a reasonable trade-off between speed and quality.
The results show that learning the proposal distribution is feasible and effective, especially when the model is conditioned on program structure, leading to superior optimization outcomes.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.