QUICK REVIEW

[论文解读] TabFact: A Large-scale Dataset for Table-based Fact Verification

Wenhu Chen, Hongmin Wang|arXiv (Cornell University)|Sep 5, 2019

Advanced Text Analysis Techniques参考文献 40被引用 181

一句话总结

TabFact 引入了一个大规模基于表格的事实核验数据集（118k 条陈述，16k 张维基百科表格），并提出两种强基线模型——Table-BERT 和 Latent Program Algorithm (LPA)——以处理对半结构化证据的语言和符号推理。

ABSTRACT

The problem of verifying whether a textual hypothesis holds based on the given evidence, also known as fact verification, plays an important role in the study of natural language understanding and semantic representation. However, existing studies are mainly restricted to dealing with unstructured evidence (e.g., natural language sentences and documents, news, etc), while verification under structured evidence, such as tables, graphs, and databases, remains under-explored. This paper specifically aims to study the fact verification given semi-structured data as evidence. To this end, we construct a large-scale dataset called TabFact with 16k Wikipedia tables as the evidence for 118k human-annotated natural language statements, which are labeled as either ENTAILED or REFUTED. TabFact is challenging since it involves both soft linguistic reasoning and hard symbolic reasoning. To address these reasoning challenges, we design two different models: Table-BERT and Latent Program Algorithm (LPA). Table-BERT leverages the state-of-the-art pre-trained language model to encode the linearized tables and statements into continuous vectors for verification. LPA parses statements into programs and executes them against the tables to obtain the returned binary value for verification. Both methods achieve similar accuracy but still lag far behind human performance. We also perform a comprehensive analysis to demonstrate great future opportunities. The data and code of the dataset are provided in \url{https://github.com/wenhuchen/Table-Fact-Checking}.

研究动机与目标

研究使用半结构化证据（表格）而非非结构化文本进行事实核验。
创建一个大规模、高质量的表格背书陈述数据集，标注为 ENTAILED 或 REFUTED。
开发并比较能够进行语言推理与符号表格推理的模型。

提出的方法

从 WikiTables 构建 TabFact，包含 16k 张表格和 118k 条人工标注为 ENTAILED 或 REFUTED 的陈述。
通过两通道集合和负重重写策略进行标注，以减轻伪影。
提出 Table-BERT：通过对表格线性化并使用预训练语言模型进行 NLI 风格的核验。
提出 Latent Program Algorithm (LPA)：执行潜在程序搜索，并使用鉴别器对程序假设进行排序。
在简单与复杂的测试分割以及人工水平上评估两种方法。

实验结果

研究问题

RQ1在半结构化表格证据上，事实核验是否能得到有效实现？
RQ2语言推理与符号推理在基于表格的核验任务中如何交互？
RQ3神经网络与程序合成方法在 TabFact 上的优点与局限性是什么？
RQ4Table-BERT 与 LPA 离人类水平在 TabFact 上还差多远？
RQ5错误分析与人工评估在链接、搜索和推理步骤上给出了哪些见解？

主要发现

TabFact 共有 118,275 条标注陈述，涉及 16,573 张表格，评注者之间的一致性较高（Fleiss κ = 0.75）。
两种基线模型达到相似的准确度，但在简单和复杂分割上均落后于人类表现。
Table-BERT 受益于自然语言表格模板以及水平/垂直线性化，其最佳变体在相对于天真基线取得显著提升。
LPA 通过将陈述转化为可在表格上执行的程序，并使用鉴别器选择一致的轨迹，获得了具有竞争力的结果。
人工评估揭示了链接与程序搜索的局限性（大约 58% 的正确链接，true 程序约 51% 的召回），显示错误推理是一个主要挑战。
总体而言，两种方法都证明了基于表格的事实核验的可行性，同时也显示出需要大量改进的空间。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。