QUICK REVIEW

[論文レビュー] Teaching language models to support answers with verified quotes

Jacob Menick, Maja Trębacz|arXiv (Cornell University)|Mar 21, 2022

Topic Modeling被引用数 53

ひとこと要約

The paper trains a 280B parameter language model, GopherCite, to answer questions with inline, verbatim quotes from retrieved sources, using supervised fine-tuning and reinforcement learning from human preferences to improve plausibility and support.

ABSTRACT

Recent large language models often answer factual questions correctly. But users can't trust any given claim a model makes without fact-checking, because language models can hallucinate convincing nonsense. In this work we use reinforcement learning from human preferences (RLHP) to train "open-book" QA models that generate answers whilst also citing specific evidence for their claims, which aids in the appraisal of correctness. Supporting evidence is drawn from multiple documents found via a search engine, or from a single user-provided document. Our 280 billion parameter model, GopherCite, is able to produce answers with high quality supporting evidence and abstain from answering when unsure. We measure the performance of GopherCite by conducting human evaluation of answers to questions in a subset of the NaturalQuestions and ELI5 datasets. The model's response is found to be high-quality 80\% of the time on this Natural Questions subset, and 67\% of the time on the ELI5 subset. Abstaining from the third of questions for which it is most unsure improves performance to 90\% and 80\% respectively, approaching human baselines. However, analysis on the adversarial TruthfulQA dataset shows why citation is only one part of an overall strategy for safety and trustworthiness: not all claims supported by evidence are true.

研究の動機と目的

答えが逐語の証拠引用と対になっている自己支援型質問応答タスク（SQA）を開発する。
証拠を容易に検証できるようにして、モデル出力への信頼性を向上させる。
モデルが不確かな場合に回答を控えることを可能にし、ベンチマークデータセットでの回答品質を向上させる。
自然な質問と子ども向けの説明のような質問に対する人間の判断で評価し、妥当性と根拠を評価する。

提案手法

回答本文内に取得した文書の引用を埋め込む Inline Evidence 構文を導入する。
人間の評価に基づく妥当で裏付けのあるサンプルを用いた教師あり学習で 280B Gopher モデルをファインチューニングする。
回答-根拠ペアに対する人間の嗜好を予測する報酬モデルを訓練し、方針を最適化するために RL (A2C) を用いる。
Google Search を介した取得により大規模な文脈文書を提供する。サンプリングと非パラメトリックな文脈により最新の証拠を可能にする。
信頼度が低い場合には回答を控えるよう、報酬モデルのスコアを閾値で判定する実装。

実験結果

リサーチクエスチョン

RQ1言語モデルは、妥当で取得元のインライン引用で裏付けられた回答を生成できるか？
RQ2人間の好みからの強化学習は、教師ありファインチューニングを超えて SQA の性能を改善するか？
RQ3回答を回避するメカニズムは、全体の回答品質とカバレッジを向上させるか？
RQ4敵対的な状況で外部ソースに依存する真実性の限界は何か？

主な発見

GopherCite は NaturalQuestionsFiltered で約 80%、ELI5Filtered で約 67% の確率で妥当かつ裏付けのある回答を達成する。
回答を控える閾値により、モデルが一部の質問にのみ回答を選択した場合、NaturalQuestions で 90%超、ELI5 で 80%に性能が向上する。
報酬モデルによる再ランキングと RL ファインチューニングは、純粋な教師ありベースラインより大幅にスコアを改善する。
TruthfulQA では、引用だけでは真実性を保証したり誤情報を緩和したりするとは限らない。
大規模で最新の検索ソースを使用し、引用を逐語的に埋め込むことで検証を支援する点でシステムは有利になる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。