[论文解读] Language Models as Knowledge Bases?
该论文分析在不进行微调的情况下,预训练语言模型(BERT、ELMo 等)存储了多少事实性和常识性知识,并使用 LAMA 探针在多种知识源上,与符号知识库和开放域问答基线进行比较。
Recent progress in pretraining language models on large textual corpora led to a surge of improvements for downstream NLP tasks. Whilst learning linguistic knowledge, these models may also be storing relational knowledge present in the training data, and may be able to answer queries structured as "fill-in-the-blank" cloze statements. Language models have many advantages over structured knowledge bases: they require no schema engineering, allow practitioners to query about an open class of relations, are easy to extend to more data, and require no human supervision to train. We present an in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models. We find that (i) without fine-tuning, BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge, (ii) BERT also does remarkably well on open-domain question answering against a supervised baseline, and (iii) certain types of factual knowledge are learned much more readily than others by standard language model pretraining approaches. The surprisingly strong ability of these models to recall factual knowledge without any fine-tuning demonstrates their potential as unsupervised open-domain QA systems. The code to reproduce our analysis is available at https://github.com/facebookresearch/LAMA.
研究动机与目标
- 评估大型预训练语言模型在不微调情况下存储的关系知识的范围。
- 将 BERT、ELMo 等模型与符号知识库和问答基线在多种知识源上进行比较。
- 确定预训练最容易学到的知识类型(实体关系、常识、问答)。
- 评估语言模型的开放域问答能力相对于监督基线。
提出的方法
- 引入 LAMA(语言模型分析)探针来测试事实性和常识性知识。
- 构建知识源(Google-RE、T-REx、ConceptNet、SQuAD),并将事实转换为完形填空模板用于模型查询。
- 使用统一的 21K 词汇表对多种预训练模型(fairseq-fconv、Transformer-XL、ELMo 变体、BERT-base、BERT-large)进行评估。
- 使用基于排序的 P@k 指标,在测试时通过从候选集中剔除有效对象来处理一对多关系。
- 与基线进行比较,如频率、带有与不带 oracle 链接的关系抽取(RE)系统,以及 DrQA 开放域问答。
实验结果
研究问题
- RQ1在不微调的情况下,预训练语言模型存储了多少关系性和常识性知识?
- RQ2模型大小与架构(BERT-large vs. BERT-base vs. ELMo 变体)如何影响跨知识源的知识召回?
- RQ3语言模型检索的知识与符号知识库和开放域问答基线相比如何?
- RQ4某些关系类型(1 对 1 vs. N 对 M)是否更容易被预训练模型捕获?
- RQ5语言模型的开放域问答性能是否可以在不微调的情况下接近监督系统?
主要发现
- BERT-large 与 BERT-base 在 Google-RE 与 T-REx 任务上优于其他模型,有时甚至接近基于 oracle 的知识抽取。
- 对某些关系类型(尤其是 1 对 1)事实知识召回表现强劲,但对 N 对 M 关系较弱。
- BERT-large 在开放域完形填空问答中表现出色,在 P@10 时达到 57.1%,相比带监督的 DrQA 系统的 63.5%,差距在开放域问答中较小。
- ELMo-5.5B 与 BERT 变体对查询措辞具有鲁棒性,但性能与训练数据暴露相关(例如训练数据中的对象提及)。
- 总体而言,预训练语言模型存储了大量的关系性和常识性知识,使得在没有显式微调或检索管道的情况下接近知识库的表现。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。