QUICK REVIEW

[論文レビュー] R$^3$: Reinforced Reader-Ranker for Open-Domain Question Answering

Shuohang Wang, Mo Yu|arXiv (Cornell University)|Aug 31, 2017

Topic Modeling参考文献 34被引用数 87

ひとこと要約

tldr: R3 は、オープンドメイン QA のための Reinforced Ranker-Reader システムを導入し、パッセージランカーとリーダーを強化学習を通じて共同訓練してエンド質問回答性能を最大化し、いくつかのデータセットで最先端を達成します。

ABSTRACT

In recent years researchers have achieved considerable success applying neural network methods to question answering (QA). These approaches have achieved state of the art results in simplified closed-domain settings such as the SQuAD (Rajpurkar et al., 2016) dataset, which provides a pre-selected passage, from which the answer to a given question may be extracted. More recently, researchers have begun to tackle open-domain QA, in which the model is given a question and access to a large corpus (e.g., wikipedia) instead of a pre-selected passage (Chen et al., 2017a). This setting is more complex as it requires large-scale search for relevant passages by an information retrieval component, combined with a reading comprehension model that "reads" the passages to generate an answer to the question. Performance in this setting lags considerably behind closed-domain performance. In this paper, we present a novel open-domain QA system called Reinforced Ranker-Reader $(R^3)$, based on two algorithmic innovations. First, we propose a new pipeline for open-domain QA with a Ranker component, which learns to rank retrieved passages in terms of likelihood of generating the ground-truth answer to a given question. Second, we propose a novel method that jointly trains the Ranker along with an answer-generation Reader model, based on reinforcement learning. We report extensive experimental results showing that our method significantly improves on the state of the art for multiple open-domain QA datasets.

研究の動機と目的

関連パッセージを効果的にランキングすることで、クローズドパッセージを超えるオープンドメイン QA の改善を動機づける。
パッセージの選択と回答抽出を分離する2要素フレームワーク（RankerとReader）を提案する。
最終回答品質に関してパッセージランキングをエンドツーエンドで最適化できるようにする。
複数のオープンドメインQAデータセットで強力な実証的向上を示す。

提案手法

2要素アーキテクチャ: 最も回答を導くパッセージを選択するRankerと、そのパッセージから回答を抽出するReader。
両コンポーネントとも、注意機構を介して質問とパッセージを比較するためにMatch-LSTMベースの表現を用いる。
Rankerは、Readerが抽出した回答がグランドトゥルースとどれだけ一致するかに基づく報酬を用いてREINFORCEで訓練される。
Readerは、選択されたパッセージ内の正解スパンの尤度を最大化するようにSGD/バックプロパゲーションで訓練される。
訓練は、ランキングのための強化学習と読取のための教師あり最適化を共同で組み合わせ、Reader訓練を安定化させるためにネガティブサンプリングを使用する。

実験結果

リサーチクエスチョン

RQ1強化学習で訓練された別個のRankerは、より回答価値の高いパッセージを選択することでオープンドメインQAを改善できるか。
RQ2RankerとReaderの共同訓練は、単一リーダーや非強化ベースの基線よりオープンドメインQAで性能を上回るか。
RQ3Ranker-Readerアプローチは、正解の回答を含むパッセージのオラクルランキングにどれだけ近づけるか。

主な発見

R3は複数のオープンドメインQAデータセットで最先端の結果を達成する。
Jont RL-based Ranker and supervised Reader outperform internal baselines (Single Reader and Simple Ranker-Reader) and several public baselines.
The Ranker trained with RL improves top-1/top-3 passage recall compared with non-RL rankers, aiding answer extraction.
Using a bounded reward with F1-based guidance reduces gradient variance and stabilizes training.
The model benefits from pre-training with a simpler Ranker-Reader variant and outperforms when jointly trained.
On several datasets, R3 significantly improves over baselines, demonstrating the value of end-to-end optimization for passage ranking in open-domain QA.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。