QUICK REVIEW

[論文レビュー] TabTracer: Monte Carlo Tree Search for Complex Table Reasoning with Large Language Models

Zhizhao Luo, Zhaojing Luo|arXiv (Cornell University)|Feb 15, 2026

Topic Modeling被引用数 0

ひとこと要約

TabTracer は、実行ガイド付きモンテカルロ木探索をバージョン付きテーブル状態とステップレベル検証とともに用い、LLM での複雑な表構造推論を改善することで、ベースラインより高い精度と低いトークンコストを実現します。

ABSTRACT

Large language models (LLMs) have emerged as powerful tools for natural language table reasoning, where there are two main categories of methods. Prompt-based approaches rely on language-only inference or one-pass program generation without step-level verification. Agent-based approaches use tools in a closed loop, but verification is often local and backtracking is limited, allowing errors to propagate and increasing cost. Moreover, they rely on chain- or beam-style trajectories that are typically combinatorially redundant, leading to high token costs. In this paper, we propose TabTracer, an agentic framework that coordinates multi-step tool calls over intermediate table states, with explicit state tracking for verification and rollback. First, it enforces step-level verification with typed operations and lightweight numeric and format checks to provide reliable rewards and suppress hallucinations. Second, execution-feedback Monte Carlo Tree Search maintains a search tree of candidate table states and uses backpropagated reflection scores to guide UCB1 selection and rollback via versioned snapshots. Third, it reduces redundancy with budget-aware pruning, deduplication, and state hashing with a monotonicity gate to cut token cost. Comprehensive evaluation on TabFact, WikiTQ, and CRT datasets shows that TabTracer outperforms state-of-the-art baselines by up to 6.7% in accuracy while reducing token consumption by 59--84%.

研究の動機と目的

LLM を用いた半構造化表の堅牢な推論を促進し、幻覚と早期エラーの伝播を解消する。
intermediate table states をまたぐ多段ツール呼び出しを調整するエージェント的フレームワークを導入する。
ステップレベル検証、バックトラック、および予算認識プリuning を提供し、トークンコストと探索の冗長性を低減する。
TabFact、WikiTQ、CRT データセットで卓越した精度を示しつつトークン消費を削減する。

提案手法

Reasoning Layer（予算付き MCTS）、Execution Layer（型付きデータフレームツール）、Storage Layer（バージョン付きテーブルスナップショット）を備えたエージェント的フレームワークとして TabTracer を提案する。
事前/事後検証とバージョン付きインターミディエイトテーブルを用いた、型付き表演算子（SelectColumns、FilterRows、GenExeCode）によるステップレベルの検証を強制する。
情報 guided のモンテカルロ木探索を用いて候補テーブル状態の木を維持し、反射スコアを逆伝播させ、バージョン付きスナップショットによるロールバックを可能にする。
予算認識プリuning、状態ハッシュ、モノトニシティゲートを適用して、ほぼ同一拡張を抑制しトークン使用を制約する。
反射ベースの報酬信号を用いて MCTS を誘導し、キャッシュされたメタデータを活用するフォールバックスコアラーで堅牢な評価を行う。
TabTracer が最先端のベースラインより最大 6.7% の精度向上を示しつつトークン消費を 59–84% 削減することを示す。

Figure 1 . Prompt-based and agent-based outputs fail to complete the aggregation, while TabTracer(our approach) slices the table to count songs per date and aggregate by month (Nov=9 vs Jan=3).

実験結果

リサーチクエスチョン

RQ1ステップレベル検証と実行根拠に基づく報酬は、LLM ベースの表推論における数値的幻覚をどの程度減らせるか？
RQ2実行フィードバック MCTS によるバックトラックは、表推論タスクにおける初期エラーへの堅牢性を改善できるか？
RQ3予算認識プリuning と状態再利用は、複雑な表推論で精度を損なうことなくトークンコストを削減できるか？
RQ4TabTracer の標準的な表推論ベンチマーク（TabFact、WikiTQ、CRT）に対する実証的な利得は、ベースラインと比較してどれほどか？

主な発見

TabTracer は TabFact、WikiTQ、CRT データセットで最先端のベースラインより最大 6.7% の精度を達成する。
トークン消費はベースラインと比べて 59–84% 減少する。
ステップレベル検証と型付き演算子により数値的幻覚を抑制し、ステップ間の誤り伝播を防ぐ。
実行フィードバック MCTS は、バージョン付きスナップショットを活用してロールバックとサブパス置換を実現し、信頼性の高いバックトラックを可能にする。
予算認識プリuning と状態ハッシュは冗長な拡張を減らし、固定トークン予算下での進捗を維持する。

Figure 2 . The reasoning layer includes planning and reflection, the execution layer issues atomic dataframe tools, and the versioned storage layer preserves snapshots for fallback and retry.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。