QUICK REVIEW

[论文解读] GenRewrite: Query Rewriting via Large Language Models

Jie Liu, Barzan Mozafari|arXiv (Cornell University)|Mar 14, 2024

Data Quality and Management被引用 5

一句话总结

GenRewrite 引入一个整体系统，使用大型语言模型（LLMs）配合自然语言改写规则（NLR2s）以及一个对照示例引导循环来重写 SQL 查询以提升性能，在 TPC 基准上实现了广泛覆盖和显著加速。

ABSTRACT

Query rewriting is an effective technique for refining poorly written queries before they reach the query optimizer. However, manual rewriting is not scalable, as it is prone to errors and requires deep expertise. Traditional query rewriting algorithms fall short too: rule-based approaches fail to generalize to new query patterns, while synthesis-based methods struggle with complex queries. Fortunately, Large Language Models (LLMs) already possess broad knowledge and advanced reasoning capabilities, making them a promising solution for tackling these longstanding challenges. In this paper, we present GenRewrite, the first holistic system that leverages LLMs for query rewriting beyond traditional rules. We introduce the notion of Natural Language Rewrite Rules (NLR2s), which serve as hints for the LLM while also a means of knowledge transfer from rewriting one query to another, allowing GenRewrite to become smarter and more effective over time. We present a novel counterexample-guided technique that iteratively corrects the syntactic and semantic errors in the rewritten query, significantly reducing the LLM costs and the manual effort required for verification. Across the standard TPC-DS and JOB benchmarks and their SQLStorm-generated variants, GenRewrite consistently optimizes more queries at every speedup threshold than all baselines. At the >=2x threshold on TPC-DS, GenRewrite improves 25 queries-1.35x more than LLM-driven baselines and 2.6x more than LLM-enhanced rule-based baselines-and the gap widens further on TPC-DS (SQLStorm); on JOB and its SQLStorm variant, where queries are simpler, absolute gains are smaller but GenRewrite still leads by a notable margin.

研究动机与目标

激发对可扩展的自动查询重写的需求，超越基于模式的规则和人工努力。
提出一个整体的 GenRewrite 系统，使用带自然语言改写规则（NLR2s）的 LLM 来生成、纠正和评估重写。
引入一个对照示例引导的迭代纠错方法，以修正改写查询中的句法和语义错误。
通过 NLR2 存储库和实用性打分机制来跨查询实现知识转移，以优先考虑提示。

提出的方法

将自然语言改写规则（NLR2s）定义为由 LLM 生成的可读提示，用于引导改写并实现知识转移。
维护一个 NLR2 存储库，并使用实用性分数仅为给定查询选择相关提示。
应用三阶段循环：建议改写、纠正等价性、评估等价性与性能。
基于反馈使用对照示例引导的细化方法，迭代修复改写中的句法/语义错误。
通过实际执行或数据库成本模型估计性能，并相应更新 NLR2 实用性分数。
在用户指定或默认时预算（每个查询 30 秒）的约束下运行，以优化重复工作负载。

实验结果

研究问题

RQ1是否可以有效地使用 LLMs 进行查询改写，超越传统基于规则或基于综合的方法？
RQ2如何跨查询转移改写知识以随着时间的推移提高覆盖率？
RQ3对照示例引导的迭代细化是否在保持等价性和加速的前提下减少错误改写和 LLM 成本？
RQ4基于 NLR2 指引的提示对改写质量和在像 TPC-DS 这样的复杂基准上的整体性能有何影响？

主要发现

GenRewrite 将 99 个 TPC-DS 查询中的 22 个改写，实现超过 2 倍的加速。
该方法在覆盖率上比最先进的传统改写高出 2.5x–3.2x，在原生 LLM 性能上高出 2.1x。
NLR2s 实现知识转移和更好的提示选择，减少对 LLM 的不必要或冲突性指导。
对照示例引导技术显著减少改写查询的语义和句法错误。
该系统强调对改写的可读解释，以帮助验证和理解。
GenRewrite 的框架通过关注一般性、与模式无关的 NLR2 来支持跨工作负载的改写重用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。