QUICK REVIEW

[论文解读] Multi-hop Question Answering via Reasoning Chains

Jifan Chen, Shih-Ting Lin|arXiv (Cornell University)|Oct 7, 2019

Topic Modeling参考文献 43被引用 66

一句话总结

本文提出一个两阶段模型，在文本上提取离散推理链，并使用基于 BERT 的问答模块给出最终答案，在 WikiHop 上达到最先进的结果，在 HotpotQA 上表现出色，但在测试阶段未依赖黄金支持事实时也表现强劲。

ABSTRACT

Multi-hop question answering requires models to gather information from different parts of a text to answer a question. Most current approaches learn to address this task in an end-to-end way with neural networks, without maintaining an explicit representation of the reasoning process. We propose a method to extract a discrete reasoning chain over the text, which consists of a series of sentences leading to the answer. We then feed the extracted chains to a BERT-based QA model to do final answer prediction. Critically, we do not rely on gold annotated chains or "supporting facts:" at training time, we derive pseudogold reasoning chains using heuristics based on named entity recognition and coreference resolution. Nor do we rely on these annotations at test time, as our model learns to extract chains from raw text alone. We test our approach on two recently proposed large multi-hop question answering datasets: WikiHop and HotpotQA, and achieve state-of-art performance on WikiHop and strong performance on HotpotQA. Our analysis shows the properties of chains that are crucial for high performance: in particular, modeling extraction sequentially is important, as is dealing with each candidate sentence in a context-aware way. Furthermore, human evaluation shows that our extracted chains allow humans to give answers with high confidence, indicating that these are a strong intermediate abstraction for this task.

研究动机与目标

推动需要来自多段文本信息的答案的多跳问答。
引入一个离散、可训练的推理链提取器，能够识别通向答案的句子序列。
利用第二阶段的问答模块（基于 BERT）来利用提取的链来预测最终答案。
在训练时无需黄金支持链进行训练，测试时也无需此类注释，使用启发式伪地面真值链。
在 WikiHop 和 HotpotQA 上演示该方法并分析对性能关键的链的属性。

提出的方法

将推理链定义为将问题与相关事实连接起来的一序列句子。
在训练时使用一个辅助图，结合 NER 基边与段内链接来生成伪地面真值链（oracle）。
训练一个链提取器，该提取器用 BERT 编码句子（BERT-Para 或 BERT-Sent 变体），并使用指针网络输出句子索引序列。
以 oracle 链标记的负对数似然进行训练；测试时采用束搜索以生成多条候选链。
将前几个链输入到基于 BERT 的答案预测模型（HotpotQA 使用 RoBERTa）以生成最终答案，数据集特定的输出头（多项选择或片段抽取）。
将有序链提取与无序句子选择进行比较，以显示顺序的好处。

实验结果

研究问题

RQ1我们是否可以在不需要黄金链的情况下自动推导出多跳问答的伪地面真值推理链？
RQ2相较于无序或非链式方法，序列化链提取模型是否能提升最终的问答性能？
RQ3提取出的链在最终答案预测中的支撑效果如何，与人工标注的支持事实相比如何？
RQ4使用不同的链监督策略和束宽对问答准确性有何影响？
RQ5提取出的链是否是对人类理解有用且可靠的中间表示？

主要发现

链提取器的序列解码在 WikiHop 和 HotpotQA 上优于无序句子选择的问答性能。
在句子编码中使用更多上下文（BERT-Para vs BERT-Sent）在某些设置中带来约5%的问答性能提升，表明跨句关系重要。
Top-5 链集成显著提升下游问答的召回率和 F1，同时保持链的不确定性。
该方法在 WikiHop 上达到最先进的结果，在 HotpotQA 上表现强劲且在测试时无需黄金支持事实。
人工评估显示提取的链能够在自信回答方面与使用标注支持事实相当，支持将链作为稳健的中间表示。
有序链提取优于无序提取，特别是在需要更强多跳推理的数据集上。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。