QUICK REVIEW

[论文解读] MLGO: a Machine Learning Guided Compiler Optimizations Framework

Mircea Trofin, Yundi Qian|arXiv (Cornell University)|Jan 13, 2021

Machine Learning and Data Classification参考文献 25被引用 35

一句话总结

MLGO 将机器学习整合到 LLVM 编译器中，替代内联以减小体积的启发式方法，利用 RL 和进化策略，在本地尺寸方面实现高达 7% 的缩减，并在跨目标上的良好泛化。

ABSTRACT

Leveraging machine-learning (ML) techniques for compiler optimizations has been widely studied and explored in academia. However, the adoption of ML in general-purpose, industry strength compilers has yet to happen. We propose MLGO, a framework for integrating ML techniques systematically in an industrial compiler -- LLVM. As a case study, we present the details and results of replacing the heuristics-based inlining-for-size optimization in LLVM with machine learned models. To the best of our knowledge, this work is the first full integration of ML in a complex compiler pass in a real-world setting. It is available in the main LLVM repository. We use two different ML algorithms: Policy Gradient and Evolution Strategies, to train the inlining-for-size model, and achieve up to 7\% size reduction, when compared to state of the art LLVM -Oz. The same model, trained on one corpus, generalizes well to a diversity of real-world targets, as well as to the same set of targets after months of active development. This property of the trained models is beneficial to deploy ML techniques in real-world settings.

研究动机与目标

激励在工业级编译器中利用机器学习来替代手工设计的编译优化启发式方法。
通过一个用于尺寸优化的内联化试点，展示 ML 在 LLVM 中的全面集成。
开发并比较用于策略优化的 ML 训练方法（强化学习与进化策略）。
展示在一个语料上训练的模型能泛化到多样化的真实目标，并在时间上保持鲁棒性。

提出的方法

将尺寸内联化问题表述为一种强化学习问题，围绕调用点的序列决策过程。
使用策略梯度（PPO）和进化策略来训练策略，以最大化原生尺寸的缩减。
引入一个仅包含 11 个数值特征的简化特征空间，使状态表示可处理。
通过行为克隆对强化学习进行热启动，以模仿 LLVM 现有的内联启发式方法并加速训练。
将训练好的 ML 策略嵌入到 LLVM 作为发布模式模型，并支持开发模式的实验。

实验结果

研究问题

RQ1在真实世界的 LLVM 流水线中，基于 ML 的策略能否超越手工调优的 LLVM 内联尺寸启发式？
RQ2在一个语料上训练的策略是否能在其他代码库以及同一目标的未来修订中无再培训的情况下泛化？
RQ3在编译器情境下训练内联策略时，强化学习与进化策略之间有什么权衡？

主要发现

与基于最先进的 LLVM Oz 相比，在试点的尺寸内联问题中，ML 策略实现了高达 7% 的尺寸缩减。
单一模型在一个语料上训练后，在多样的真实目标以及数月持续开发中表现出良好的泛化性。
探索了两种训练范式：强化学习（带 PPO 的策略梯度）和进化策略，各自具有不同的权衡。
用行为克隆对强化学习进行热启动，通过利用现有的 LLVM 内联启发式方法来加速训练。
该框架将策略训练与生产使用分离，将固定策略嵌入编译器以实现确定性、发布模式的运行。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。