QUICK REVIEW

[論文レビュー] Language Models as Knowledge Bases?

Fabio Petroni, Tim Rocktäschel|arXiv (Cornell University)|Sep 3, 2019

Topic Modeling参考文献 27被引用数 113

ひとこと要約

本論文は、事前学習済み言語モデル（BERT、ELMo など）が微調整なしでどれだけの事実知識および常識知識を蓄えているかを分析し、LAMA プローブを用いて複数の知識ソースを横断的に、シンボリック知識ベース（KB）やオープンドメインQAのベースラインと比較している。

ABSTRACT

Recent progress in pretraining language models on large textual corpora led to a surge of improvements for downstream NLP tasks. Whilst learning linguistic knowledge, these models may also be storing relational knowledge present in the training data, and may be able to answer queries structured as "fill-in-the-blank" cloze statements. Language models have many advantages over structured knowledge bases: they require no schema engineering, allow practitioners to query about an open class of relations, are easy to extend to more data, and require no human supervision to train. We present an in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models. We find that (i) without fine-tuning, BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge, (ii) BERT also does remarkably well on open-domain question answering against a supervised baseline, and (iii) certain types of factual knowledge are learned much more readily than others by standard language model pretraining approaches. The surprisingly strong ability of these models to recall factual knowledge without any fine-tuning demonstrates their potential as unsupervised open-domain QA systems. The code to reproduce our analysis is available at https://github.com/facebookresearch/LAMA.

研究の動機と目的

微調整なしで大規模な事前学習済み言語モデルに蓄えられた関係知識の程度を評価する。
BERT、ELMo、その他のモデルを、複数の知識ソースに跨ってシンボリックKBとQAのベースラインと比較する。
事前学習によって最も容易に学習される知識の種類（エンティティ関係、常識、QA）を特定する。
監視付きベースラインと比較して、言語モデルのオープンエンドQA能力を評価する。

提案手法

事実知識と常識知識を検証するためにLAMA（LAnguage Model Analysis）プローブを導入する。
知識ソース（Google-RE、T-REx、ConceptNet、SQuAD）を構築し、事実をクローズテンプレートに変換してモデルへ照会する。
統一された21Kトークンの語彙を使用して、複数の事前学習モデル（fairseq-fconv、Transformer-XL、ELMo系、BERT-base、BERT-large）を評価する。
順位ベースの指標（P@k）を用い、テスト時に候補から有効なオブジェクトを剪定して1対多の関係を考慮する。
頻度、オラクルリンクを有する/しない関係抽出（RE）システム、DrQAオープンドメインQAなどのベースラインと比較する。

実験結果

リサーチクエスチョン

RQ1微調整なしで事前学習済み言語モデルにはどれくらいの関係知識と常識知識が蓄えられているか？
RQ2モデルサイズとアーキテクチャ（BERT-large vs. BERT-base vs. ELMo系）は、知識ソース全体での知識リコールにどう影響するか？
RQ3LMs によって取得される知識は、シンボリックKBとオープンドメインQAのベースラインとどのように比較されるか？
RQ4特定の関係タイプ（1-to-1 対 N-to-M）は、事前学習済みモデルでより良く捉えられるのか？
RQ5微調整なしでLMsのオープンドメインQA性能は、 supervised システムに近づくか？

主な発見

BERT-large と BERT-base は他のモデルより優れており、時には oracle ベースの知識抽出と競合することもある（Google-RE および T-REx のタスクで）。
事実知識のリコールは、特定の関係タイプ（特に1対1）で強いが、N対-M関係では弱い。
BERT-large はオープンドメインのクローズQAで卓越しており、P@10 評価で 57.1% precision@10、監督付き DrQA 系が 63.5% で、オープンドメインQAのギャップは小さい。
ELMo-5.5B および BERT の系はクエリ表現への頑健性を示すが、パフォーマンスはトレーニングデータの露出度と相関する（例：トレーニングデータ中のオブジェクトの言及）。
全体として、事前学習済み言語モデルは十分な関係知識と常識知識を蓄えており、明示的なファインチューニングや検索パイプラインなしでほぼKBの性能を実現している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。