QUICK REVIEW

[論文レビュー] EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

Tianshu Zhang, Kun Qian|arXiv (Cornell University)|Mar 11, 2026

Natural Language Processing Techniques被引用数 0

ひとこと要約

EvoSchemaは、現実的なスキーマ変更、特にテーブルレベルでの頑健性を評価・向上させるテキスト-to-SQLのベンチマークとトレーニングパラダイムであり、変動の包括的な分類学とトレーニング中のスキーマ設計の増強を用いる。

ABSTRACT

Neural text-to-SQL models, which translate natural language questions (NLQs) into SQL queries given a database schema, have achieved remarkable performance. However, database schemas frequently evolve to meet new requirements. Such schema evolution often leads to performance degradation for models trained on static schemas. Existing work either mainly focuses on simply paraphrasing some syntactic or semantic mappings among NLQ, DB and SQL, or lacks a comprehensive and controllable way to investigate the model robustness issue under the schema evolution, which is insufficient when facing the increasingly complex and rich database schema changes in reality, especially in the LLM era. To address the challenges posed by schema evolution, we present EvoSchema, a comprehensive benchmark designed to assess and enhance the robustness of text-to-SQL systems under real-world schema changes. EvoSchema introduces a novel schema evolution taxonomy, encompassing ten perturbation types across columnlevel and table-level modifications, systematically simulating the dynamic nature of database schemas. Through EvoSchema, we conduct an in-depth evaluation spanning different open source and closed-source LLMs, revealing that table-level perturbations have a significantly greater impact on model performance compared to column-level changes. Furthermore, EvoSchema inspires the development of more resilient text-to-SQL systems, in terms of both model training and database design. The models trained on EvoSchema's diverse schema designs can force the model to distinguish the schema difference for the same questions to avoid learning spurious patterns, which demonstrate remarkable robustness compared to those trained on unperturbed data on average. This benchmark offers valuable insights into model behavior and a path forward for designing systems capable of thriving in dynamic, real-world environments.

研究の動機と目的

現実世界のアプリケーションで進化するデータベーススキーマに対応できる頑健なテキスト-to-SQLシステムを動機づける。
列レベルとテーブルレベルの10種類の摂動タイプを含む包括的なスキーマ進化分類学を紹介する。
現実的な進化シナリオを模擬するために、BIRDベースのスキーマを摂動してEvoSchemaを作成する。
スキーマ進化の下でオープンソースおよびクローズドソースのLLMを評価し、頑健性のギャップを理解する。
頑健性を高めるために、多様なスキーマ設計でデータを増強するトレーニングパラダイムを提案する。

提案手法

10種類の摂動タイプによるスキーマ進化分類学を定義する（列レベル：追加、削除、リネーム、分割、統合；テーブルレベル：追加、削除、リネーム、分割、統合）。
NLQを固定しつつBIRDデータセットのシードを摂動してEvoSchemaを合成し、金SQLをそれに合わせて調整する。
ヒューリスティクスとGPTモデル（GPT-3.5、GPT-4）を組み合わせたハイブリッドなデータ生成フレームワークを用いて現実的な列/テーブル摂動を作成する。
人間の検証とSQL専門家によるクロスバリデーションを適用し、変更されたスキーマと金SQLの正確性を保証する。
Table Match F1とColumn Match F1という2つの評価指標を導入し、テーブルレベルおよび列レベルでの頑健性を測定する。
スキーマ設計を増強した訓練データで学習させ、スキーマ変更を区別させ、学習中の不適切なパターンの学習を減らす。

Figure 1. The left (a) is the overview of the framework to collect EvoSchema dataset. The top right (b) is a column-level schema evolution example; the bottom right (c) is a table-level schema evolution example.

実験結果

リサーチクエスチョン

RQ1現在のテキスト-to-SQLモデルは、列-摂動とテーブル-摂動の異なるタイプのスキーマ進化にどれほど敏感か。
RQ2多様なスキーマ設計での訓練は、摂動タイプ全般にわたるモデルの頑健性を向上させるか。
RQ3テキスト-to-SQLにおけるスキーマ進化下で頑健性を定量化する効果的な指標は何か。

主な発見

テーブルレベルの摂動は、列レベルの摂動よりもモデル性能に大きな影響を与える。
2つの詳細指標、Table Match F1とColumn Match F1は、摂動タイプ間の頑健性の差を明らかにする。
多様なスキーマ設計でデータを増強した訓練（摂動訓練）は頑健性を向上させ、さまざまな摂動評価データに顕著な効果を示す。
EvoSchema摂動で訓練されたモデルは、摂動なしデータのみで訓練したモデルと比べて、特定のスキーマ摂動評価で最大で33ポイントの利得を示す。
オープンソースとクローズドソースのLLMを横断するベンチマークは、スキーマ変更に対する感度の相対的な差を浮き彫りにし、より頑健なテキスト-to-SQLシステムの設計に情報を提供する。

Figure 2. An overview of different perturbation types of EvoSchema . The top is an unperturbed example in BIRD (Li et al. , 2024d ) ; the middle is the column-level perturbation; the bottom is the table-level perturbation. “Remove Col in SQL”: remove columns that appear in gold SQL; “Remove Tables”:

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。