QUICK REVIEW

[论文解读] Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA

Fengyu Li, Junhao Zhu|arXiv (Cornell University)|Feb 26, 2026

Topic Modeling被引用 0

一句话总结

本论文提出 Operation-R1，是一个框架，在单次推理步骤中训练轻量级大模型（LLMs）生成用于表格问答（table QA）的数据准备管道，利用自监督奖励和鲁棒性机制的 RLVR。

ABSTRACT

Table Question Answering (TQA) aims to answer natural language questions over structured tables. Large Language Models (LLMs) enable promising solutions to this problem, with operator-centric solutions that generate table manipulation pipelines in a multi-step manner offering state-of-the-art performance. However, these solutions rely on multiple LLM calls, resulting in prohibitive latencies and computational costs. We propose Operation-R1, the first framework that trains lightweight LLMs (e.g., Qwen-4B/1.7B) via a novel variant of reinforcement learning with verifiable rewards to produce high-quality data-preparation pipelines for TQA in a single inference step. To train such an LLM, we first introduce a self-supervised rewarding mechanism to automatically obtain fine-grained pipeline-wise supervision signals for LLM training. We also propose variance-aware group resampling to mitigate training instability. To further enhance robustness of pipeline generation, we develop two complementary mechanisms: operation merge, which filters spurious operations through multi-candidate consensus, and adaptive rollback, which offers runtime protection against information loss in data transformation. Experiments on two benchmark datasets show that, with the same LLM backbone, Operation-R1 achieves average absolute accuracy gains of 9.55 and 6.08 percentage points over multi-step preparation baselines, with 79\% table compression and a 2.2$ imes$ reduction in monetary cost.

研究动机与目标

在 Table QA 中替代多步骤、延迟较高的数据准备管道，采用单次推理的LLM解决方案以提升效率的动机。
开发一个轻量级的LLM训练框架，能够生成针对每个问题–表格对定制的有效数据准备管道。
提出可验证、细粒度的监督和稳定性机制来训练管道生成器。
通过共识与自适应回滚机制在预处理阶段确保鲁棒性。

提出的方法

将数据准备框架化为一个面向QA的管道生成问题。
使用带可验证奖励的强化学习（RLVR）和 ORPO（Operation-wise Group Relative Policy Optimization）来训练轻量级LLMs。
引入自监督、面向单元格的QA奖励，为每个操作提供细粒度反馈。
应用方差感知的分组重采样，以细粒度奖励稳定训练。
实现 Operation Merge，通过操作树和投票对多个候选管道进行集成。
在在线推理阶段，生成多个管道，然后通过投票选出最被支持的管道，并据此执行结构化与语义操作。
引入自适应回滚，在需要时通过回滚回到早期表状态以从信息丢失中恢复。

实验结果

研究问题

RQ1是否可以训练一个轻量级的LLM，在单次推理步骤中生成高质量的TQA数据准备管道？
RQ2如何在没有 ground-truth 管道的情况下，提供可靠、细粒度且低成本的管道生成监督信号？
RQ3在多样化的QA场景中，哪些机制能提高单步骤管道生成的鲁棒性与稳定性？
RQ4与多步骤基线相比，所提管道生成是否在降低表格尺寸和计算成本的同时提升QA准确性？

主要发现

Operation-R1在 WikiTQ 上的平均绝对准确度提升为 9.62 个点，在 TabFact 上提升为 6.05 个点，相对于未进行预处理的基线。
平均而言，该方法将表格尺寸降低了 79%。
Operation-R1 仅需一次轻量级模型推理，显著降低货币成本约 2.2 倍。
在各基准上，Operation-R1 通过 Operation Merge 与 Adaptive Rollback 增强了TQA有效性并保持鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。