QUICK REVIEW

[论文解读] Abductive Commonsense Reasoning

Chandra Bhagavatula, Ronan Le Bras|arXiv (Cornell University)|Aug 15, 2019

Topic Modeling参考文献 46被引用 34

一句话总结

本论文引入 ART 数据集用于 abductive commonsense 推理，并定义 Abductive Natural Language Inference (alpha NLI) 与 Abductive Natural Language Generation (alpha NLG)。它评估强基线，显示与人类性能之间存在巨大差距，并分析模型局限性与迁移学习潜力。

ABSTRACT

Abductive reasoning is inference to the most plausible explanation. For example, if Jenny finds her house in a mess when she returns from work, and remembers that she left a window open, she can hypothesize that a thief broke into her house and caused the mess, as the most plausible explanation. While abduction has long been considered to be at the core of how people interpret and read between the lines in natural language (Hobbs et al., 1988), there has been relatively little research in support of abductive natural language inference and generation. We present the first study that investigates the viability of language-based abductive reasoning. We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations. Based on this dataset, we conceptualize two new tasks -- (i) Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and (ii) Abductive NLG: a conditional generation task for explaining given observations in natural language. On Abductive NLI, the best model achieves 68.9% accuracy, well below human performance of 91.4%. On Abductive NLG, the current best language generators struggle even more, as they lack reasoning capabilities that are trivial for humans. Our analysis leads to new insights into the types of reasoning that deep pre-trained language models fail to perform--despite their strong performance on the related but more narrowly defined task of entailment NLI--pointing to interesting avenues for future research.

研究动机与目标

将 abductive 推理作为人类常识解释的核心方面进行动机化。
创建一个包含可行解释的叙事上下文的大规模数据集（ART）。
定义两项新任务：abductive natural language inference (alpha NLI) 与 abductive natural language generation (alpha NLG)。
使用最先进的 NLI 模型和语言生成器提供强基线，以建立基准。

提出的方法

将 alpha NLI 定义为一个二选一的二元多选任务，在给定 O1 与 O2 的情况下选择最可行的假设。
提出概率模型（全连接、线性链、依赖关系）来刻画 O1、O2 与 H 之间的关系。
将 alpha NLG 模型化为在给定 O1、O2 的条件下对 h+ 的生成，背景知识可选自 COMeT / ATOMIC。
通过将 ROCStories 故事与众包的可行/不可行假设配对，并使用对抗性筛选以最小化伪影来构建 ART。
使用基于 BERT 的分类器评估 alpha NLI 的基线，使用基于 GPT2 的生成器评估 alpha NLG；并与人工基线一起分析。

实验结果

研究问题

RQ1语言模型在叙事观察上的 abductive 推理能否超越偶然性或简单蕴涵基线？
RQ2在不同常识类别中，当前预训练语言模型在 abductive 推理方面有哪些局限？
RQ3将结构化常识知识（如 COMeT / ATOMIC）融入是否能提升 abductive 生成和推理？
RQ4对 ART 的训练能否通过迁移学习提升在其他常识任务上的表现？

主要发现

模型	GPT AF 准确率 (%)	ART 准确率 (%)
Random	50.1	50.4
Majority	50.1	50.8
Infersent (Conneau et al., 2017)	50.9	50.8
ESIM+ELMo (Chen et al., 2017)	58.2	58.8
GPT-ft	52.6 (0.9)	63.1 (0.5)
BERT-ft [h^{i} Only]	55.9 (0.7)	59.5 (0.2)
BERT-ft [O1 Only]	63.9	63.5
BERT-ft [O2 Only]	68.1	66.6
BERT-ft [Linear Chain]	65.3	68.9
BERT-ft [Fully Connected]	72.0 (0.5)	68.6 (0.5)
Human Performance	-	91.4

最佳 alpha NLI 基线（基于 BERT 的全连接）达到 68.9% 的准确率，远低于人类 91.4%。
在人类在所有评估类别上均超出模型；简单的蕴涵基线在 ART 上接近机会水平。
Alpha NLG 更具挑战性；最佳生成器在 hold-out 假设上约达到 45% ，而人类为 96%。
对抗性筛选与模型结构（全连接 vs. 线性链）影响性能，全球基线下全连接通常表现更好。
ART 能带来迁移学习收益，预训练于 ART 时对较小目标数据集（如 WinoGrande、WSC、DPR、Hellaswag）有益，特别是在目标数据有限时。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。