[論文レビュー] MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering
MATA is a model-agnostic multi-agent TableQA framework that orchestrates CoT, PoT, and text-to-SQL reasoning with lightweight tools to improve accuracy and efficiency across open-source and proprietary LLMs.
Recent advances in Large Language Models (LLMs) have significantly improved table understanding tasks such as Table Question Answering (TableQA), yet challenges remain in ensuring reliability, scalability, and efficiency, especially in resource-constrained or privacy-sensitive environments. In this paper, we introduce MATA, a multi-agent TableQA framework that leverages multiple complementary reasoning paths and a set of tools built with small language models. MATA generates candidate answers through diverse reasoning styles for a given table and question, then refines or selects the optimal answer with the help of these tools. Furthermore, it incorporates an algorithm designed to minimize expensive LLM agent calls, enhancing overall efficiency. MATA maintains strong performance with small, open-source models and adapts easily across various LLM types. Extensive experiments on two benchmarks of varying difficulty with ten different LLMs demonstrate that MATA achieves state-of-the-art accuracy and highly efficient reasoning while avoiding excessive LLM inference. Our results highlight that careful orchestration of multiple reasoning pathways yields scalable and reliable TableQA. The code is available at https://github.com/AIDAS-Lab/MATA.
研究の動機と目的
- Motivate reliable TableQA in resource-constrained and privacy-sensitive settings.
- Propose a model-agnostic multi-agent architecture that combines CoT, PoT, and text-to-SQL reasoning.
- Introduce lightweight tools (Scheduler, Confidence Checker, Format Matcher) to optimize inference and ensure concise outputs.
- Demonstrate strong, hardware-agnostic performance across open-source and proprietary LLMs on varied benchmarks.
提案手法
- Introduce three reasoning paths (CoT, PoT, text-to-SQL) processed by dedicated agents.
- Use lightweight tools (Scheduler, Confidence Checker, Format Matcher) to manage reasoning flow and output formatting.
- Employ a three-stage process: candidate generation, debugging-assisted refinement, and confidence-based final selection.
- Incorporate a debugging loop for PoT/t2SQL outputs using Python/SQL debuggers to correct code before execution.
- Train Scheduler and Confidence Checker on a large synthetic TableQA dataset built from WikiTQ, TabMWP, and TabFact with multiple LLMs.
- Evaluate on Penguins in a Table and TableBench with ten LLMs, using EM, fuzzy matching, and token-level F1.]
- 目標:資源制約下およびプライバシーに敏感な設定で信頼性の高いTableQAを促進する。
- CoT、PoT、およびtext-to-SQL推論を組み合わせたモデル非依存のマルチエージェント architectureを提案する。
- 推論を最適化し、簡潔な出力を保証する軽量ツール(Scheduler、Confidence Checker、Format Matcher)を導入する。
- オープンソースおよび独自のLLMsに対して varied benchmarksでハードウェア依存性を排した高い性能を実証する。
実験結果
リサーチクエスチョン
- RQ1Can a model-agnostic multi-agent framework achieve state-of-the-art accuracy across diverse LLMs in TableQA tasks?
- RQ2Does orchestrating multiple reasoning paths (CoT, PoT, text-to-SQL) improve reliability and efficiency compared to single-path methods?
- RQ3How do lightweight tools and an optimized scheduling strategy affect inference costs and answer quality?
- RQ4What is the impact of component ablations (CC, JA, Sch, FM) on overall performance and efficiency?
主な発見
- MATA achieves state-of-the-art accuracy across open-source LLMs on two benchmarks with diverse table sizes and question types.
- On TableBench, MATA outperforms the strongest baseline by up to 40.1% EM, 46.7% fuzzy matching, and 33.1% F1.
- The Confidence Checker is the most critical module, with ablations showing large accuracy drops when removed.
- Using Scheduler reduces LLM agent calls by 14.6% on Penguins in a Table and 7.6% on TableBench, improving efficiency.
- The CC can bypass extra inference when a candidate meets the confidence threshold, reducing unnecessary JA invocations by 95.8% (Penguins) and 60.6% (TableBench).
- Ablation results indicate that a balanced use of reasoning paths yields the best trade-off between diversity, accuracy, and cost, especially on complex tasks.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。