QUICK REVIEW

[论文解读] Teaching language models to support answers with verified quotes

Jacob Menick, Maja Trębacz|arXiv (Cornell University)|Mar 21, 2022

Topic Modeling被引用 53

一句话总结

本文训练一个 280B 参数的语言模型 GopherCite，通过在检索来源中逐字引用原文来回答问题，使用监督微调和来自人类偏好的强化学习以提高可置信性和证据支撑。

ABSTRACT

Recent large language models often answer factual questions correctly. But users can't trust any given claim a model makes without fact-checking, because language models can hallucinate convincing nonsense. In this work we use reinforcement learning from human preferences (RLHP) to train "open-book" QA models that generate answers whilst also citing specific evidence for their claims, which aids in the appraisal of correctness. Supporting evidence is drawn from multiple documents found via a search engine, or from a single user-provided document. Our 280 billion parameter model, GopherCite, is able to produce answers with high quality supporting evidence and abstain from answering when unsure. We measure the performance of GopherCite by conducting human evaluation of answers to questions in a subset of the NaturalQuestions and ELI5 datasets. The model's response is found to be high-quality 80\% of the time on this Natural Questions subset, and 67\% of the time on the ELI5 subset. Abstaining from the third of questions for which it is most unsure improves performance to 90\% and 80\% respectively, approaching human baselines. However, analysis on the adversarial TruthfulQA dataset shows why citation is only one part of an overall strategy for safety and trustworthiness: not all claims supported by evidence are true.

研究动机与目标

开发一个自证答案问答任务（SQA），其中答案配有逐字证据引用。
通过便于核验证据来提升对模型输出的信任。
在模型不确定时实现回避，以提高基准数据集上的答案质量。
在自然问题和类解释性问题上进行人工评估，以评估可置信性和证据支撑性。

提出的方法

引入 Inline Evidence 语法，在回答文本中嵌入来自检索文档的引用。
使用对人类评估的可行和有证据支撑样本进行监督学习，对 280B 的 Gopher 模型进行微调。
训练一个奖励模型来预测人类对答案-证据对的偏好，并使用 RL（A2C）优化策略。
通过 Google Search 的检索提供大量上下文文档；采样与非参数上下文使证据保持最新。
通过对奖励模型分数设阈值来实现回避，在置信度低时不回答。

实验结果

研究问题

RQ1语言模型是否能够给出既可信又由检索文档的行内引用所支撑的答案？
RQ2来自人类偏好的强化学习是否在超出监督微调的情况下提升 SQA 性能？
RQ3是否存在用于回避回答的机制以提升整体答案质量与覆盖率？
RQ4在对抗性场景中依赖外部来源以确保真实性有哪些局限性？

主要发现

GopherCite 在 NaturalQuestionsFiltered 上大约 80% 的时间给出既可信又有证据支撑的答案，在 ELI5Filtered 上大约 67% 的时间如此。
当模型仅回答一部分问题时，回避阈值将 NaturalQuestions 的表现提升到超过 90%，ELI5 提升到 80%。
使用奖励模型进行再排序和 RL 微调显著提升分数，相较于纯监督基线。
在 TruthfulQA 上，单纯的引证并不能保证真实性或减轻误导性证据。
系统受益于使用大型、最新的检索源并逐字嵌入引用以帮助核验。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。