QUICK REVIEW

[论文解读] R$^3$: Reinforced Reader-Ranker for Open-Domain Question Answering

Shuohang Wang, Mo Yu|arXiv (Cornell University)|Aug 31, 2017

Topic Modeling参考文献 34被引用 87

一句话总结

R3 引入了一个用于开放域问答的强化排名-阅读器(Ranker-Reader)系统，该系统通过强化学习联合训练通道排名器和阅读器，以最大化最终问答性能，在若干数据集上达到最新的最优结果。

ABSTRACT

In recent years researchers have achieved considerable success applying neural network methods to question answering (QA). These approaches have achieved state of the art results in simplified closed-domain settings such as the SQuAD (Rajpurkar et al., 2016) dataset, which provides a pre-selected passage, from which the answer to a given question may be extracted. More recently, researchers have begun to tackle open-domain QA, in which the model is given a question and access to a large corpus (e.g., wikipedia) instead of a pre-selected passage (Chen et al., 2017a). This setting is more complex as it requires large-scale search for relevant passages by an information retrieval component, combined with a reading comprehension model that "reads" the passages to generate an answer to the question. Performance in this setting lags considerably behind closed-domain performance. In this paper, we present a novel open-domain QA system called Reinforced Ranker-Reader $(R^3)$, based on two algorithmic innovations. First, we propose a new pipeline for open-domain QA with a Ranker component, which learns to rank retrieved passages in terms of likelihood of generating the ground-truth answer to a given question. Second, we propose a novel method that jointly trains the Ranker along with an answer-generation Reader model, based on reinforcement learning. We report extensive experimental results showing that our method significantly improves on the state of the art for multiple open-domain QA datasets.

研究动机与目标

通过有效地对相关段落进行排序来超越封闭式段落，提升开放域问答的动力。
提出一个两组件框架（Ranker 和 Reader），将段落选择与答案提取分离。
实现段落排序对最终答案质量的端到端优化。
在多个开放域问答数据集上展示出显著的经验增益。

提出的方法

两组件架构：一个 Ranker 选择最能产生答案的段落，和一个 Reader 从该段落提取答案。
两者都使用基于 Match-LSTM 的表示，通过注意力机制将问题与段落进行比较。
Ranker 使用 REINFORCE 进行训练，奖励基于阅读器提取的答案与真实答案的匹配程度。
Reader 使用 SGD/反向传播训练，以最大化在所选段落中正确答案片段的似然性。
联合训练将对排序的强化学习与对阅读的监督优化结合起来，并使用负采样来稳定阅读器的训练。

实验结果

研究问题

RQ1一个独立的通过强化学习训练的 Ranker 能否通过选择更具答案价值的段落来提升开放域问答？
RQ2Ranker 和 Reader 的联合训练是否在开放域问答中优于单一阅读器或非强化基线？
RQ3Ranker-Reader 方法距离包含正确答案的Oracle排序还能差多远？

主要发现

R3 在多个开放域问答数据集上取得了最新的最优结果。
基于联合强化学习的 Ranker 与有监督的 Reader 的组合超过内部基线（单一 Reader 和简单 Ranker-Reader）以及若干公开基线。
使用强化学习训练的 Ranker 相较于非强化学习的排名器，在前 1 位和前 3 位的段落召回方面有改进，帮助答案提取。
使用带界限的奖励并以 F1 为导向，降低梯度方差，稳定训练。
模型从一个更简单的 Ranker-Reader 变体的预训练收益，并在联合训练时表现更优。
在若干数据集上，R3 显著优于基线，证明了端到端优化在开放域问答中的段落排序价值。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。