QUICK REVIEW

[論文レビュー] GenRewrite: Query Rewriting via Large Language Models

Jie Liu, Barzan Mozafari|arXiv (Cornell University)|Mar 14, 2024

Data Quality and Management被引用数 5

ひとこと要約

GenRewrite は、自然言語リライト規則 (NLR2s) を用いた大規模言語モデル（LLMs）と、反例誘導ループを組み合わせたホリスティックなシステムを導入し、SQL クエリを性能向上のためにリライトします。TPC ベンチマークでの大幅なカバレッジとスピードアップを達成します。

ABSTRACT

Query rewriting is an effective technique for refining poorly written queries before they reach the query optimizer. However, manual rewriting is not scalable, as it is prone to errors and requires deep expertise. Traditional query rewriting algorithms fall short too: rule-based approaches fail to generalize to new query patterns, while synthesis-based methods struggle with complex queries. Fortunately, Large Language Models (LLMs) already possess broad knowledge and advanced reasoning capabilities, making them a promising solution for tackling these longstanding challenges. In this paper, we present GenRewrite, the first holistic system that leverages LLMs for query rewriting beyond traditional rules. We introduce the notion of Natural Language Rewrite Rules (NLR2s), which serve as hints for the LLM while also a means of knowledge transfer from rewriting one query to another, allowing GenRewrite to become smarter and more effective over time. We present a novel counterexample-guided technique that iteratively corrects the syntactic and semantic errors in the rewritten query, significantly reducing the LLM costs and the manual effort required for verification. Across the standard TPC-DS and JOB benchmarks and their SQLStorm-generated variants, GenRewrite consistently optimizes more queries at every speedup threshold than all baselines. At the >=2x threshold on TPC-DS, GenRewrite improves 25 queries-1.35x more than LLM-driven baselines and 2.6x more than LLM-enhanced rule-based baselines-and the gap widens further on TPC-DS (SQLStorm); on JOB and its SQLStorm variant, where queries are simpler, absolute gains are smaller but GenRewrite still leads by a notable margin.

研究の動機と目的

パターンベースの規則や手作業を超えた、スケーラブルな自動クエリリライトの必要性を喚起する。
LLMs と自然言語リライト規則 (NLR2s) を組み合わせて、リライトを生成・訂正・評価するホリスティックな GenRewrite システムを提案する。
書き換えられたクエリの構文的・意味的エラーを修正するための、反例誘導型の反復補正法を導入する。
NLR2 リポジトリと有用度スコアリング機構を介して、クエリ間で知識を転移できるようにし、ヒントの優先度を付ける。

提案手法

NLR2sを、リライトを導くために LLM が出力する人間が読みやすいヒントとして定義し、知識転移を可能にする。
NLR2リポジトリを維持し、ユーティリティスコアを用いて、特定のクエリに関連するヒントのみを選択する。
三相ループを適用する: リライトを提案し、同値性を訂正し、同値性と性能を評価する。
フィードバックに基づき、反例誘導による改良を用いて、リライトの構文的・意味的エラーを反復的に修正する。
実行実測またはデータベースコストモデルを通じて性能を推定し、それに応じて NLR2 の有用度を更新する。
ユーザー指定の、またはデフォルトのタイムバジェット（クエリあたり 30 秒）で運用して、反復的なワークロードを最適化する。

実験結果

リサーチクエスチョン

RQ1従来のルールベースや合成ベースのアプローチを超えて、LLMs をクエリリライトに効果的に活用できるか。
RQ2どのようにクエリ間でリライト知識を転移させ、時間とともにカバレッジを改善できるか。
RQ3反例誘導型の反復的改良は、同値性と高速化を維持しながら、誤ったリライトと LLM コストを削減するか。
RQ4TPC-DS のような複雑なベンチマークにおいて、NLR2 ガイド付きヒントがリライト品質と全体の性能に与える影響は何か。

主な発見

GenRewrite は 99 の TPC-DS クエリのうち 22 件をリライトし、2x を超えるスピードアップを達成した。
このアプローチは、最先端の従来のリライトと比較してカバレッジを 2.5x–3.2x 向上させ、デフォルトの LLM パフォーマンスよりも 2.1x 高い。
NLR2s は知識転移とヒント選択を改善し、LLM への不要なまたは衝突する指示を減らす。
反例誘導技法は、書き換えられたクエリの意味論的・構文的エラーを実質的に減らす。
このシステムは、検証と理解を支援するために、リライトの人間に読みやすい説明を強調する。
GenRewrite のフレームワークは、一般的でスキーマ非依存の NLR2s に焦点を当てることで、ワークロード間でリライトを再利用することを可能にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。