QUICK REVIEW

[论文解读] Learning to Optimize Join Queries With Deep Reinforcement Learning

Sanjay Krishnan, Zongheng Yang|arXiv (Cornell University)|Aug 9, 2018

Optimization and Search Problems参考文献 50被引用 82

一句话总结

论文提出 DQ，一种基于深度强化学习的优化器，学习连接搜索策略以替代固定启发式，在 Calcite、PostgreSQL 和 SparkSQL 上实现更快学习的同时得到具有竞争力的执行计划。

ABSTRACT

Exhaustive enumeration of all possible join orders is often avoided, and most optimizers leverage heuristics to prune the search space. The design and implementation of heuristics are well-understood when the cost model is roughly linear, and we find that these heuristics can be significantly suboptimal when there are non-linearities in cost. Ideally, instead of a fixed heuristic, we would want a strategy to guide the search space in a more data-driven way---tailoring the search to a specific dataset and query workload. Recognizing the link between classical Dynamic Programming enumeration methods and recent results in Reinforcement Learning (RL), we propose a new method for learning optimized join search strategies. We present our RL-based DQ optimizer, which currently optimizes select-project-join blocks. We implement three versions of DQ to illustrate the ease of integration into existing DBMSes: (1) A version built on top of Apache Calcite, (2) a version integrated into PostgreSQL, and (3) a version integrated into SparkSQL. Our extensive evaluation shows that DQ achieves plans with optimization costs and query execution times competitive with the native query optimizer in each system, but can execute significantly faster after learning (often by orders of magnitude).

研究动机与目标

阐述非线性连接成本和大计划空间中固定启发式的局限性。
提出在传统优化器框架内学习连接搜索策略的强化学习方法。
演示将 DQ 系统集成到多种 DBMS 中，代码变更最小化。
表明数据驱动的规划在非线性成本模型下，规划时间和计划质量可以达到甚至超过启发式基线。
强调利用连接优化中的最优子结构实现数据高效训练。

提出的方法

将连接优化表述为一个马尔可夫决策过程，其中状态是查询图，动作是连接。
使用深度 Q 学习来近似给定查询 G 上的连接 c 的 Q 函数 Q(G,c)。
用一-hot 属性指示符对查询图和连接进行特征化，并包含选择条件和物理算子选择。
通过利用最优子结构，从原生优化器高效收集训练数据，生成 (G,c,J(c),G') 样本。
将 DQ 优化器作为学习型替代，替换 Calcite、PostgreSQL 和 SparkSQL 的传统计划搜索模块。
在 Join Order Benchmark 和 TPC-DS 的工作负载上离线训练，开销最小，且可选探索（epsilon-greedy）。
说明选择 Q-learning 相对于基于策略的方法的理由，以为子计划中的所有连接生成分数并实现 top-k 规划。

实验结果

研究问题

RQ1深度强化学习模型是否能够学习在不同成本模型和工作负载下表现良好的数据驱动连接搜索策略？
RQ2将基于 RL 的搜索策略整合后，与原生优化器相比，对 Calcite、PostgreSQL 和 SparkSQL 的规划时间和计划质量有何影响？
RQ3该方法是否比固定启发式对非线性成本行为（如内存溢出、可重用哈希表）具有更强鲁棒性？
RQ4通过利用原生优化器的最优子结构数据进行训练，可以实现多高的数据效率？

主要发现

DQ 在某些场景下的规划时间比穷尽动态规划枚举快多于200倍。
DQ 在每个系统中提供的计划与原生优化器具有竞争力。
在非线性成本模型下，相对于固定启发式，DQ 可以显著提升计划质量（在模拟成本中，相对于六个基线，提升 1.7x 到 3x）。
通过利用原生优化器的最优子计划高效生成训练数据，从单个计划就得到大规模训练数据集。
集成到 PostgreSQL 和 SparkSQL 需要的代码改动少于每个系统 300 行。
DQ 通过规划加速，能够扩展更广的计划空间（例如 bushy 计划和笛卡尔积），并具有潜在的执行时收益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。