Skip to main content
QUICK REVIEW

[论文解读] Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Wenhan Xiong, Jingfei Du|arXiv (Cornell University)|Dec 19, 2019
Topic Modeling参考文献 34被引用 156
一句话总结

本论文提出 WKLM,一种弱监督预训练目标,强制从非结构化文本中进行实体中心的知识学习,提升实体相关的问答和对细粒度实体类型的识别,相对于 BERT 基线。它在维基百科上进行实体替换训练,以注入真实世界实体知识,无需额外的下游内存或架构修改。

ABSTRACT

Recent breakthroughs of pretrained language models have shown the effectiveness of self-supervised learning for a wide range of natural language processing (NLP) tasks. In addition to standard syntactic and semantic NLP tasks, pretrained models achieve strong improvements on tasks that involve real-world knowledge, suggesting that large-scale language modeling could be an implicit method to capture knowledge. In this work, we further investigate the extent to which pretrained models such as BERT capture knowledge using a zero-shot fact completion task. Moreover, we propose a simple yet effective weakly supervised pretraining objective, which explicitly forces the model to incorporate knowledge about real-world entities. Models trained with our new objective yield significant improvements on the fact completion task. When applied to downstream tasks, our model consistently outperforms BERT on four entity-related question answering datasets (i.e., WebQuestions, TriviaQA, SearchQA and Quasar-T) with an average 2.7 F1 improvements and a standard fine-grained entity typing dataset (i.e., FIGER) with 5.7 accuracy gains.

研究动机与目标

  • Motivate whether pretrained models implicitly capture real-world entity knowledge and quantify its extent via a zero-shot fact completion task.
  • Introduce a weakly supervised knowledge learning objective that explicitly teaches models about real-world entities from unstructured text.
  • Show that knowledge-enriched pretraining improves entity-related QA datasets and fine-grained entity typing beyond standard BERT baselines.

提出的方法

  • Entity-centric pretraining with weak supervision via entity replacement: replace mentions with same-type entities and train the model to detect replacement.
  • Use boundary-word representations of entities to predict P(e|C) and distinguish true vs false knowledge statements.
  • Combine the knowledge-learning objective with masked language model (MLM) loss in a multi-task pretraining setup on Wikipedia and BooksCorpus.
  • Maintain standard BERT architecture and no extra memory or architectural changes for downstream tasks.
  • Perform ablations to compare WKLM against MLM-only and extended MLM baselines to isolate the knowledge-learning contribution.

实验结果

研究问题

  • RQ1Can large-scale pretraining encode explicit entity-level knowledge beyond standard MLM objectives?
  • RQ2Does a weakly supervised knowledge-learning objective improve entity-related tasks without external knowledge bases?
  • RQ3How does WKLM perform on zero-shot fact completion and downstream entity-centric QA and typing tasks compared to BERT and GPT-2?
  • RQ4What is the impact of MLM ratio and, separately, entity-replacement objectives on downstream performance?

主要发现

Relation Name# of Candidates# of AnswersModelBERT-baseBERT-largeGPT-2OursAverage Hits@10
HasChild (P40)9063.8HasChild9.006.0020.563.5-
NotableWork (P800)9015.2NotableWork1.882.562.394.10-
CapitalOf (P36)8202.2CapitalOf1.871.5515.849.1-
FoundedBy (P112)7983.7FoundedBy2.441.938.6524.2-
Creator (P170)5363.6Creator4.574.577.279.84-
PlaceOfBirth (P19)4971.8PlaceOfBirth19.230.98.9523.2-
LocatedIn (P131)3821.9LocatedIn13.252.521.061.1-
EducatedAt (P69)3744.1EducatedAt9.107.9311.016.9-
PlaceOfDeath (P20)3131.7PlaceOfDeath43.042.68.8326.5-
Occupation (P106)1901.4Occupation8.5810.79.1710.7-
Average Hits@10---11.316.116.328.9-
  • WKLM achieves best results on 8 of 10 fact-completion relations in zero-shot evaluation.
  • On open-domain QA, WKLM outperforms BERT on entity-related datasets by an average of 2.7 F1 points when ranking scores are not used; with ranking, it attains near state-of-the-art results on three datasets.
  • On fine-grained entity typing (FIGER), WKLM sets a new state-of-the-art with accuracy 60.21, Ma-F1 81.99, Mi-F1 77.00.
  • Ablation shows that combining the WKLM objective with MLM yields the best downstream performance; using too high an MLM masking ratio (15%) can hurt knowledge learning.
  • WKLM requires no additional data processing or memory during fine-tuning and works with the original BERT architecture.
  • Compared to ERNIE, WKLM provides larger absolute gains on FIGER, suggesting text-based knowledge extraction is effective without external KBs.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。