[Paper Review] TiInsight: A SQL-based Automated Exploratory Data Analysis System through Large Language Models
TiInsight is a SQL-based automated cross-domain EDA system that uses large language models to generate hierarchical data context, clarify and decompose questions, convert them to SQL via TiSQL, and visualize results with TiChart through a GUI.
The SQL-based exploratory data analysis has garnered significant attention within the data analysis community. The emergence of large language models (LLMs) has facilitated the paradigm shift from manual to automated data exploration. However, existing methods generally lack the ability for cross-domain analysis, and the exploration of LLMs capabilities remains insufficient. This paper presents TiInsight, an SQL-based automated cross-domain exploratory data analysis system. First, TiInsight offers a user-friendly GUI enabling users to explore data using natural language queries. Second, TiInsight offers a robust cross-domain exploratory data analysis pipeline: hierarchical data context (i.e., HDC) generation, question clarification and decomposition, text-to-SQL (i.e., TiSQL), and data visualization (i.e., TiChart). Third, we have implemented and deployed TiInsight in the production environment of PingCAP and demonstrated its capabilities using representative datasets. The demo video is available at https://youtu.be/JzYFyYd-emI.
Motivation & Objective
- Motivate the need for automated, cross-domain exploratory data analysis using SQL and LLMs in real-world settings.
- Develop an end-to-end system (HDC, question clarification, TiSQL, TiChart) that lowers the barrier to data exploration without SQL expertise.
- Provide a user-friendly GUI and production-ready deployment to demonstrate practical EDA workflows.
- Address limitations of state-of-the-art text-to-SQL and table-to-chart approaches in cross-domain, unclear-intent scenarios.
Proposed method
- Propose Hierarchical Data Context (HDC) generation to summarize and organize database schema for cross-domain exploration.
- Introduce a question clarification and decomposition module to resolve ambiguous intents and split tasks into sub-questions.
- Develop TiSQL, a text-to-SQL component that uses a two-stage mapping (coarse-to-fine) with a map-reduce framework and a self-refinement chain (EXPLAIN and EXECUTE) to correct SQL errors.
- Create TiChart, a rule-based data visualization tool augmented with LLMs to recommend suitable chart types for complex tasks.
- Provide a web UI that supports data import, NL interaction, bookmarking, and multi-LLM switching.
- Demonstrate end-to-end in production-like settings using two real-world datasets (Financial and Bird) and publish supporting demo resources.
Experimental results
Research questions
- RQ1Can TiInsight generate accurate SQL from natural language questions across different domains using an HDC-guided context?
- RQ2How effectively can question clarification and decomposition handle unclear user intents in exploratory data tasks?
- RQ3Does the two-stage TiSQL with a map-reduce and self-refinement chain improve accuracy and efficiency over traditional prompt-based methods?
- RQ4Can TiChart reliably recommend appropriate visualizations for complex, multi-subtask explorations?
- RQ5What is the practicality and performance of TiInsight in a production environment like PingCAP across representative datasets?
Key findings
- TiInsight demonstrates an end-to-end EDA workflow from NL input to SQL execution and visualization via a GUI.
- HDC generation accelerates cross-domain understanding by summarizing schema, tables, and relationships to guide SQL generation.
- TiSQL combines coarse-to-fine schema filtering with a map-reduce prompt process and a self-refinement chain to reduce SQL errors.
- TiChart blends rule-based heuristics and LLMs to pick suitable visualization types for multi-subtask explorations.
- The system is demonstrated in two real-world PingCAP datasets (Financial and Bird), showing practical deployment potential.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.