QUICK REVIEW

[論文レビュー] DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction

Mohammadreza Pourreza, Davood Rafiei|arXiv (Cornell University)|Apr 21, 2023

Topic Modeling被引用数 56

ひとこと要約

DIN-SQL はテキスト-to-SQL をスキーマリンク、分類/分解、NatSQL を用いた SQL 生成、自己修正に分解し、文脈内学習を活用して実行精度を大幅に向上させ、待機データで Spider の SOTA を超え、BIRD で新しい SOTA を樹立します。

ABSTRACT

There is currently a significant gap between the performance of fine-tuned models and prompting approaches using Large Language Models (LLMs) on the challenging task of text-to-SQL, as evaluated on datasets such as Spider. To improve the performance of LLMs in the reasoning process, we study how decomposing the task into smaller sub-tasks can be effective. In particular, we show that breaking down the generation problem into sub-problems and feeding the solutions of those sub-problems into LLMs can be an effective approach for significantly improving their performance. Our experiments with three LLMs show that this approach consistently improves their simple few-shot performance by roughly 10%, pushing the accuracy of LLMs towards SOTA or surpassing it. On the holdout test set of Spider, the SOTA, in terms of execution accuracy, was 79.9 and the new SOTA at the time of this writing using our approach is 85.3. Our approach with in-context learning beats many heavily fine-tuned models by at least 5%. Additionally, when evaluated on the BIRD benchmark, our approach achieved an execution accuracy of 55.9%, setting a new SOTA on its holdout test set.

研究の動機と目的

プロンプトベースの LLM とファインチューニング済みモデルの間のギャップを、Spider や BIRD のようなテキスト-to-SQL ベンチマークで埋めることを動機付ける。
LLM の推論能力を向上させるため、SQL 生成を目的とした decomposed な多モジュールのプロンプティング手法を提案する。
モジュール固有のプロンプト（スキーマリンク、分類/分解、NatSQL ベースの生成、自己修正）が性能に与える影響を調査する。
自己修正プロンプティングと query の難易度に応じた適応プロンプト選択の有効性を実証する。

提案手法

NL2SQL タスクを four prompting-enabled モジュール（スキーマリンク、分類/分解、NatSQL 中間表現を用いた SQL 生成、自己修正ステップ）へ分解する。
各モジュールと各クエリクラスについてトレーニングセットからの few-shot デモを用いる。
SQL 生成のための専門的なプロンプトを選択するために easy、non-nested complex、nested complex の三要分類を用いる。
非ネスト式の複雑なクエリの NL-to-SQL 翻訳を容易にするため中間表現として NatSQL を採用する。
生成後の minor な SQL エラーを修正するために generic または gentle な自己修正モジュールを適用する。
GPT-4 および CodeX 系列を用いて、貪欲デコード、ゼロ温度、生成と修正を制御する特定の max token 設定で評価する。

Figure 1 : Statistics of simple few-shot failures using CodeX Davinci (Op refers to operators, Cond refers to conditions, and cols refers to columns)

実験結果

リサーチクエスチョン

RQ1テキスト-to-SQL をスキーマリンク、分類/分解、および中間表現へ分解することは、単純な few-shot プロンプティングより LLM の性能を向上させるか。
RQ2適応的なプロンプティング（クエリ難易度に基づく）と自己修正が Spider および BIRD での実行精度と正確一致精度に与える影響はどの程度か。
RQ3このアプローチはドメイン横断のテキスト-to-SQL ベンチマークでファインチューニング SOTA 手法と比較してどうか。
RQ4スキーマリンクと NatSQL 中間表現を組み込むことで、結合、ネスト、スキーマの曖昧性といった一般的な失敗モードを緩和できるか。

主な発見

Model	EX (Spider Holdout)	EM (Spider Holdout)
DIN-SQL + GPT-4 (Ours)	85.3	60
DIN-SQL + CodeX Davinci (Ours)	78.2	57
RESDSQL-3B + NatSQL (DB content used) (Li et al., 2023a)	79.9	72
Graphix-3B+PICARD (DB content used) (Li et al., 2023b)	77.6	74
SHiP+PICARD (DB content used) (Zhao et al., 2022)	76.6	73.1
N-best Rerankers + PICARD (DB content used) (Zeng et al., 2022)	75.9	72.2
RASAT+PICARD (DB content used) (Qi et al., 2022)	75.5	70.9
T5-3B+PICARD (DB content used) (Scholak et al., 2021)	75.1	71.9
RATSQL+GAP+NatSQL (DB content used) (Gan et al., 2021)	73.3	68.7
RYANSQL v2 + BERT (Choi et al., 2021)	-	60.6
SmBoP + BART (Rubin and Berant, 2020)	-	60.5

GPT-4 で Spider holdout における実行精度 85.3% を達成し、当時の新しい SOTA を樹立。
CodeX Davinci を用いた Spider holdout で実行精度 78.2%、60% の EM を達成。
BIRD ベンチマークでは holdout セットで GPT-4 による実行精度 55.9% を達成、SOTA を確立（GPT-4 で development での EM は 50.72%）。
分解型プロンプティングは、簡易な few-shot プロンプティングより LLM 全体で約 10% の性能向上を継続的に示す。
アブレーションにより、モジュール各自が性能に寄与し、特に non-easy クエリでスキーマリンクが顕著な改善をもたらすことを示す。
自己修正プロンプト（CodeX には generic、GPT-4 には gentle）により SQL 欠陥を減らし実行精度を向上させる。

Figure 2 : An overview of the proposed methodology including all four modules

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。