QUICK REVIEW

[论文解读] EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

Tianshu Zhang, Kun Qian|arXiv (Cornell University)|Mar 11, 2026

Natural Language Processing Techniques被引用 0

一句话总结

EvoSchema 是一个基准和训练范式，通过在训练期间使用对扰动的全面分类和模式设计增强来评估和提升文本到 SQL 的鲁棒性，特别是在表级别的现实模式变化下。

ABSTRACT

Neural text-to-SQL models, which translate natural language questions (NLQs) into SQL queries given a database schema, have achieved remarkable performance. However, database schemas frequently evolve to meet new requirements. Such schema evolution often leads to performance degradation for models trained on static schemas. Existing work either mainly focuses on simply paraphrasing some syntactic or semantic mappings among NLQ, DB and SQL, or lacks a comprehensive and controllable way to investigate the model robustness issue under the schema evolution, which is insufficient when facing the increasingly complex and rich database schema changes in reality, especially in the LLM era. To address the challenges posed by schema evolution, we present EvoSchema, a comprehensive benchmark designed to assess and enhance the robustness of text-to-SQL systems under real-world schema changes. EvoSchema introduces a novel schema evolution taxonomy, encompassing ten perturbation types across columnlevel and table-level modifications, systematically simulating the dynamic nature of database schemas. Through EvoSchema, we conduct an in-depth evaluation spanning different open source and closed-source LLMs, revealing that table-level perturbations have a significantly greater impact on model performance compared to column-level changes. Furthermore, EvoSchema inspires the development of more resilient text-to-SQL systems, in terms of both model training and database design. The models trained on EvoSchema's diverse schema designs can force the model to distinguish the schema difference for the same questions to avoid learning spurious patterns, which demonstrate remarkable robustness compared to those trained on unperturbed data on average. This benchmark offers valuable insights into model behavior and a path forward for designing systems capable of thriving in dynamic, real-world environments.

研究动机与目标

在现实应用中让文本到 SQL 系统对不断演化的数据库模式具有鲁棒性。
引入覆盖列级和表级十种扰动类型的全面模式演化分类。
通过对 BIRD 基准架构进行扰动，创建 EvoSchema，以模拟现实的演进场景。
评估开源与闭源大语言模型在模式演化下的鲁棒性差异。
提出一种训练范式，通过多样化的模式设计来增强鲁棒性并扩充数据。

提出的方法

定义十种扰动类型的模式演化分类（五种列级：添加、删除、重命名、拆分、合并；五种表级：添加、删除、重命名、拆分、合并）。
通过对 BIRD 数据集的种子进行扰动，同时保持 NLQ 固定并相应调整 gold SQL，合成 EvoSchema。
使用结合启发式方法与 GPT 模型（GPT-3.5、GPT-4）的混合数据生成框架，创建现实的列/表扰动。
应用人工验证与与 SQL 专家进行交叉验证，确保修改后的模式与 gold SQL 的正确性。
引入两种评估指标 Table Match F1 和 Column Match F1，以在表级和列级衡量鲁棒性。
用带有扩增模式设计的数据训练模型，以强制区分模式变化并减少学习虚假模式。

Figure 1. The left (a) is the overview of the framework to collect EvoSchema dataset. The top right (b) is a column-level schema evolution example; the bottom right (c) is a table-level schema evolution example.

实验结果

研究问题

RQ1当前文本到 SQL 模型对不同类型的模式演化（列级与表级扰动）有多敏感？
RQ2用多样化的模式设计进行训练是否能提高模型在不同扰动类型上的鲁棒性？
RQ3在文本到 SQL 中，量化模式演化鲁棒性的有效指标有哪些？

主要发现

表级扰动对模型性能的影响显著大于列级扰动。
两种细粒度指标 Table Match F1 和 Column Match F1 展示了不同扰动类型下的鲁棒性差异。
用多样化的模式设计扩充训练数据（扰动训练）可提升鲁棒性，在各种扰动评估数据上有显著收益。
使用 EvoSchema 扰动训练的模型，在某些模式扰动评估上相比仅在未扰动数据上训练的模型提升高达 33 点。
对开源与闭源大语言模型的基准测试揭示了对模式变化的敏感性相对差异，并为设计更具鲁棒性的文本到 SQL 系统提供了参考。

Figure 2. An overview of different perturbation types of EvoSchema . The top is an unperturbed example in BIRD (Li et al. , 2024d ) ; the middle is the column-level perturbation; the bottom is the table-level perturbation. “Remove Col in SQL”: remove columns that appear in gold SQL; “Remove Tables”:

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。