QUICK REVIEW

[论文解读] Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Wenhan Xiong, Jingfei Du|arXiv (Cornell University)|Dec 19, 2019

Topic Modeling参考文献 34被引用 156

一句话总结

本论文提出 WKLM，一种弱监督预训练目标，强制从非结构化文本中进行实体中心的知识学习，提升实体相关的问答和对细粒度实体类型的识别，相对于 BERT 基线。它在维基百科上进行实体替换训练，以注入真实世界实体知识，无需额外的下游内存或架构修改。

ABSTRACT

Recent breakthroughs of pretrained language models have shown the effectiveness of self-supervised learning for a wide range of natural language processing (NLP) tasks. In addition to standard syntactic and semantic NLP tasks, pretrained models achieve strong improvements on tasks that involve real-world knowledge, suggesting that large-scale language modeling could be an implicit method to capture knowledge. In this work, we further investigate the extent to which pretrained models such as BERT capture knowledge using a zero-shot fact completion task. Moreover, we propose a simple yet effective weakly supervised pretraining objective, which explicitly forces the model to incorporate knowledge about real-world entities. Models trained with our new objective yield significant improvements on the fact completion task. When applied to downstream tasks, our model consistently outperforms BERT on four entity-related question answering datasets (i.e., WebQuestions, TriviaQA, SearchQA and Quasar-T) with an average 2.7 F1 improvements and a standard fine-grained entity typing dataset (i.e., FIGER) with 5.7 accuracy gains.

研究动机与目标

Motivate whether pretrained models implicitly capture real-world entity knowledge and quantify its extent via a zero-shot fact completion task.
Introduce a weakly supervised knowledge learning objective that explicitly teaches models about real-world entities from unstructured text.
Show that knowledge-enriched pretraining improves entity-related QA datasets and fine-grained entity typing beyond standard BERT baselines.

提出的方法

Entity-centric pretraining with weak supervision via entity replacement: replace mentions with same-type entities and train the model to detect replacement.
Use boundary-word representations of entities to predict P(e|C) and distinguish true vs false knowledge statements.
Combine the knowledge-learning objective with masked language model (MLM) loss in a multi-task pretraining setup on Wikipedia and BooksCorpus.
Maintain standard BERT architecture and no extra memory or architectural changes for downstream tasks.
Perform ablations to compare WKLM against MLM-only and extended MLM baselines to isolate the knowledge-learning contribution.

实验结果

研究问题

RQ1Can large-scale pretraining encode explicit entity-level knowledge beyond standard MLM objectives?
RQ2Does a weakly supervised knowledge-learning objective improve entity-related tasks without external knowledge bases?
RQ3How does WKLM perform on zero-shot fact completion and downstream entity-centric QA and typing tasks compared to BERT and GPT-2?
RQ4What is the impact of MLM ratio and, separately, entity-replacement objectives on downstream performance?

主要发现

Relation Name	# of Candidates	# of Answers	Model	BERT-base	BERT-large	GPT-2	Ours	Average Hits@10
HasChild (P40)	906	3.8	HasChild	9.00	6.00	20.5	63.5	-
NotableWork (P800)	901	5.2	NotableWork	1.88	2.56	2.39	4.10	-
CapitalOf (P36)	820	2.2	CapitalOf	1.87	1.55	15.8	49.1	-
FoundedBy (P112)	798	3.7	FoundedBy	2.44	1.93	8.65	24.2	-
Creator (P170)	536	3.6	Creator	4.57	4.57	7.27	9.84	-
PlaceOfBirth (P19)	497	1.8	PlaceOfBirth	19.2	30.9	8.95	23.2	-
LocatedIn (P131)	382	1.9	LocatedIn	13.2	52.5	21.0	61.1	-
EducatedAt (P69)	374	4.1	EducatedAt	9.10	7.93	11.0	16.9	-
PlaceOfDeath (P20)	313	1.7	PlaceOfDeath	43.0	42.6	8.83	26.5	-
Occupation (P106)	190	1.4	Occupation	8.58	10.7	9.17	10.7	-
Average Hits@10	-	-	-	11.3	16.1	16.3	28.9	-

WKLM achieves best results on 8 of 10 fact-completion relations in zero-shot evaluation.
On open-domain QA, WKLM outperforms BERT on entity-related datasets by an average of 2.7 F1 points when ranking scores are not used; with ranking, it attains near state-of-the-art results on three datasets.
On fine-grained entity typing (FIGER), WKLM sets a new state-of-the-art with accuracy 60.21, Ma-F1 81.99, Mi-F1 77.00.
Ablation shows that combining the WKLM objective with MLM yields the best downstream performance; using too high an MLM masking ratio (15%) can hurt knowledge learning.
WKLM requires no additional data processing or memory during fine-tuning and works with the original BERT architecture.
Compared to ERNIE, WKLM provides larger absolute gains on FIGER, suggesting text-based knowledge extraction is effective without external KBs.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。