QUICK REVIEW

[论文解读] How Context Affects Language Models' Factual Predictions

Fabio Petroni, Patrick Lewis|arXiv (Cornell University)|May 10, 2020

Topic Modeling参考文献 42被引用 80

一句话总结

该论文表明，在测试时用检索到的上下文在完全无监督设置下对预训练语言模型（BERT/RoBERTa）进行增强，显著提升基于事实的填空式问答性能，达到有监督基线水平，并且 BERT 的 Next Sentence Prediction 有助于过滤上下文中的噪声。

ABSTRACT

When pre-trained on large unsupervised textual corpora, language models are able to store and retrieve factual knowledge to some extent, making it possible to use them directly for zero-shot cloze-style question answering. However, storing factual knowledge in a fixed number of weights of a language model clearly has limitations. Previous approaches have successfully provided access to information outside the model weights using supervised architectures that combine an information retrieval system with a machine reading component. In this paper, we go a step further and integrate information from a retrieval system with a pre-trained language model in a purely unsupervised way. We report that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline. Furthermore, processing query and context with different segment tokens allows BERT to utilize its Next Sentence Prediction pre-trained classifier to determine whether the context is relevant or not, substantially improving BERT's zero-shot cloze-style question-answering performance and making its predictions robust to noisy contexts.

研究动机与目标

证明在测试时检索上下文可以在无监督的情况下释放预训练语言模型中的事实知识。
量化上下文类型（oracle、retrieved、generated、adversarial）如何影响基于 LAMA 的填空问答性能。
评估 BERT/RoBERTa 对嘈杂上下文的鲁棒性以及 NSP 在过滤上下文相关性中的作用。
将无监督检索增强的语言模型性能与有监督的开放域问答基线（DrQA）进行比较。

提出的方法

在 LAMA 关系探针上使用填空式问题评估 BERT-large 和 RoBERTa-large。
将不同上下文类型加入填空提示：oracle（维基百科中的片段）、retrieved（类似 DrQA 的 TF-IDF 段落）、generated（自回归语言模型上下文）以及 adversarial（不相关的上下文）。
使用模型特定的分段标记（BERT）或在适用时使用 eos/separator 将问题与上下文分离。
在 Google-RE、T-REx 和基于 SQuAD 的子集上测量单次答案的 P@1。
分析 NSP 分类器的激活以及输入分段对上下文有用性的影响。
与 DrQA 作为有监督的开放域问答基线进行对比，并讨论对无监督问答的影响。

实验结果

研究问题

RQ1无监督检索增强的语言模型是否能够在事实知识任务上达到有监督的问答性能？
RQ2上下文类型（oracle、retrieved、generated、adversarial）如何影响基于 LM 的填空问答准确性？
RQ3BERT 的 NSP 目标和输入分段在利用上下文方面起到什么作用？
RQ4从检索得到的上下文对不同关系和数据集的提升是否具有鲁棒性？

主要发现

带有上下文的提示显著提升 LM 的事实性问答：B-ora（oracle）在无上下文输入上带来大幅提升，B-ret（retrieved）通常能够达到甚至超过有监督基线。
BERT 配合检索上下文在 Google-RE 和 SQuAD 上与 DrQA 具有竞争力，并且在跨关系的情况下比无上下文基线有显著提升。
对抗性上下文表明 BERT 在使用两段输入时仍然具有鲁棒性，表明 NSP 有助于过滤不相关上下文；拼接会显著降低性能。
生成上下文在某些关系上有帮助，但通常不如检索或 oracle 上下文有效；在嘈杂情况下可能会误导。
基于 NSP 的相关性信号似乎能够使上下文的条件化更鲁棒，从而在没有有监督微调的情况下提升准确性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。