QUICK REVIEW

[Paper Review] Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Wenhan Xiong, Jingfei Du|arXiv (Cornell University)|Dec 19, 2019

Topic Modeling34 references156 citations

TL;DR

The paper introduces WKLM, a weakly supervised pretraining objective that enforces entity-centric knowledge learning from unstructured text, improving entity-related QA and fine-grained entity typing over BERT baselines. It uses entity replacement training on Wikipedia to inject real-world entity knowledge without extra downstream memory or architecture changes.

ABSTRACT

Recent breakthroughs of pretrained language models have shown the effectiveness of self-supervised learning for a wide range of natural language processing (NLP) tasks. In addition to standard syntactic and semantic NLP tasks, pretrained models achieve strong improvements on tasks that involve real-world knowledge, suggesting that large-scale language modeling could be an implicit method to capture knowledge. In this work, we further investigate the extent to which pretrained models such as BERT capture knowledge using a zero-shot fact completion task. Moreover, we propose a simple yet effective weakly supervised pretraining objective, which explicitly forces the model to incorporate knowledge about real-world entities. Models trained with our new objective yield significant improvements on the fact completion task. When applied to downstream tasks, our model consistently outperforms BERT on four entity-related question answering datasets (i.e., WebQuestions, TriviaQA, SearchQA and Quasar-T) with an average 2.7 F1 improvements and a standard fine-grained entity typing dataset (i.e., FIGER) with 5.7 accuracy gains.

Motivation & Objective

Motivate whether pretrained models implicitly capture real-world entity knowledge and quantify its extent via a zero-shot fact completion task.
Introduce a weakly supervised knowledge learning objective that explicitly teaches models about real-world entities from unstructured text.
Show that knowledge-enriched pretraining improves entity-related QA datasets and fine-grained entity typing beyond standard BERT baselines.

Proposed method

Entity-centric pretraining with weak supervision via entity replacement: replace mentions with same-type entities and train the model to detect replacement.
Use boundary-word representations of entities to predict P(e|C) and distinguish true vs false knowledge statements.
Combine the knowledge-learning objective with masked language model (MLM) loss in a multi-task pretraining setup on Wikipedia and BooksCorpus.
Maintain standard BERT architecture and no extra memory or architectural changes for downstream tasks.
Perform ablations to compare WKLM against MLM-only and extended MLM baselines to isolate the knowledge-learning contribution.

Experimental results

Research questions

RQ1Can large-scale pretraining encode explicit entity-level knowledge beyond standard MLM objectives?
RQ2Does a weakly supervised knowledge-learning objective improve entity-related tasks without external knowledge bases?
RQ3How does WKLM perform on zero-shot fact completion and downstream entity-centric QA and typing tasks compared to BERT and GPT-2?
RQ4What is the impact of MLM ratio and, separately, entity-replacement objectives on downstream performance?

Key findings

WKLM achieves best results on 8 of 10 fact-completion relations in zero-shot evaluation.
On open-domain QA, WKLM outperforms BERT on entity-related datasets by an average of 2.7 F1 points when ranking scores are not used; with ranking, it attains near state-of-the-art results on three datasets.
On fine-grained entity typing (FIGER), WKLM sets a new state-of-the-art with accuracy 60.21, Ma-F1 81.99, Mi-F1 77.00.
Ablation shows that combining the WKLM objective with MLM yields the best downstream performance; using too high an MLM masking ratio (15%) can hurt knowledge learning.
WKLM requires no additional data processing or memory during fine-tuning and works with the original BERT architecture.
Compared to ERNIE, WKLM provides larger absolute gains on FIGER, suggesting text-based knowledge extraction is effective without external KBs.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.