QUICK REVIEW

[논문 리뷰] Memo-SQL: Structured Decomposition and Experience-Driven Self-Correction for Training-Free NL2SQL

Zerui Yang, Weichuan Wang|arXiv (Cornell University)|2026. 01. 15.

Advanced Database Systems and Queries인용 수 0

한 줄 요약

Memo-SQL은 구조화 분해와 검색 보강, 오류 인식(self-correction) 기반의 training-free NL2SQL 프레임워크를 제안한다. 이는 prior test-time scaling 방법들보다 계산 자원이 눈에 띄게 낮으면서 BIRD에서 오픈, 제로-미세조정 상태의 최첨단 결과를 달성한다.

ABSTRACT

Existing NL2SQL systems face two critical limitations: (1) they rely on in-context learning with only correct examples, overlooking the rich signal in historical error-fix pairs that could guide more robust self-correction; and (2) test-time scaling approaches often decompose questions arbitrarily, producing near-identical SQL candidates across runs and diminishing ensemble gains. Moreover, these methods suffer from a stark accuracy-efficiency trade-off: high performance demands excessive computation, while fast variants compromise quality. We present Memo-SQL, a training-free framework that addresses these issues through two simple ideas: structured decomposition and experience-aware self-correction. Instead of leaving decomposition to chance, we apply three clear strategies, entity-wise, hierarchical, and atomic sequential, to encourage diverse reasoning. For correction, we build a dynamic memory of both successful queries and historical error-fix pairs, and use retrieval-augmented prompting to bring relevant examples into context at inference time, no fine-tuning or external APIs required. On BIRD, Memo-SQL achieves 68.5% execution accuracy, setting a new state of the art among open, zero-fine-tuning methods, while using over 10 times fewer resources than prior TTS approaches.

연구 동기 및 목표

오로지 올바른 시연만이 아니라 과거의 오류-수정 신호를 활용하는 training-free NL2SQL의 동기를 제시한다.
NL2SQL을 구조화된 분할-정복(task)으로 형식화하여 다양한 추론 경로를 촉진한다.
추론 시점에 출력을 다듬기 위해 경험 기반의 검색 강화(self-correction) 메모리를 도입한다.
개방형의 제로-미세조정 방법들 중 BIRD에서 최첨단 실행 정확도를 Demonstrate한다.

제안 방법

세 가지 분해 전략(entity-wise, hierarchical, atomic sequential)을 적용하여 다양한 하위 질의를 병렬로 생성한다.
ReAct+Reflect 루프를 사용해 하위 질의에 대해 추론하고, 하위 SQL을 생성하며, 실행 결과를 관찰하고, 오류를 수정하기 위해 반성한다.
질문, 올바른 SQL, 오류 SQL, 오류 유형, 수정 힌트로 구성된 동적 오류-수정 메모리를 유지하고, 맥락 내 개선을 돕기 위해 유사한 실패를 검색한다.
few-shot 인-컨텍스트 프롬프트를 통해 세 가지 엔드-투-엔드 SQL 후보(CTE, flat JOIN, nested)를 생성하고 self-consistency 스코어링으로 선택한다.
정제(self-refinement)를 위한 2단계 메모리 검색을 구현한다: 상위-k개의 검색된 오류-수정 쌍을 가져와 오류 유형별로 중복 제거한 다음 합의가 있을 때까지 크리틱-리파인 루프를 적용한다.
벤치마크 BIRD, SPIDER, CHESS-SDS로 평가하고, training-free 오픈 방법과 베이스라인을 비교하며 효율성과 정확도 간의 균형을 분석한다.

Figure 1: Conceptual comparison between a standard user-in-the-loop NL2SQL workflow and our proposed self-correction framework. In the standard approach, the model generates an initial SQL query, which is then revised through explicit user feedback, a process that typically stores only correct examp

실험 결과

연구 질문

RQ1질문을 명시적으로 분해 구조화함으로써 training-free NL2SQL 시스템이 경쟁력 있는 정확도를 달성할 수 있는가?
RQ2검색 보강 인-컨텍스트 학습을 통해 과거의 오류-수정 경험을 활용하는 것이 정적 시연을 넘어 자체 수정의 개선에 기여하는가?
RQ3구조화된 분해와 반복적 수정이 NL2SQL의 실행 정확도와 효율성에 미치는 영향은 무엇인가?
RQ4다른 데이터셋으로부터 구축된 오류-수정 메모리를 사용하여 memo-SQL이 데이터셋(BIRD, SPIDER, CHESS-SDS) 간에 얼마나 잘 일반화하는가?

주요 결과

Memo-SQL은 BIRD dev-new에서 개방형 제로-미세조정 NL2SQL 방법들 중 최첨단 실행 정확도를 달성한다.
주요 TTS 접근법에 비해 계산 오버헤드를 한 차 이상 줄인다.
검색 보강 자기 수정은 성공 및 실패 이력을 모두 활용하여 정적 시연에 비해 강건성을 향상시킨다.
세 가지 분해 전략은 다양한 추론 경로를 촉진하고 N개 후보 중 최적의 후보를 합성하게 한다.
모델 규모에 관계없이 Memo-SQL은 높은 정확도를 유지하며 효율성을 크게 향상시킨다.

Figure 2: Overview of the Memo-SQL framework, which integrates problem decomposition, ReAct+Reflection reasoning, and self-correction to generate accurate SQL queries. The pipeline begins with preprocessing and multi-strategy question decomposition. Each sub-problem is solved via an iterative ReAct+

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.