QUICK REVIEW

[论文解读] Commonsense Knowledge Mining from Pretrained Models

Joshua Feldman, Joe Davison|arXiv (Cornell University)|Sep 2, 2019

Topic Modeling参考文献 17被引用 44

一句话总结

该论文提出一种无监督方法，通过将三元组转换为句子并用固定的预训练语言模型对其评分，在新数据泛化和维基百科挖掘方面显示出可与之竞争的表现，尽管在数据库内基准较低。

ABSTRACT

Inferring commonsense knowledge is a key challenge in natural language processing, but due to the sparsity of training data, previous work has shown that supervised methods for commonsense knowledge mining underperform when evaluated on novel data. In this work, we develop a method for generating commonsense knowledge using a large, pre-trained bidirectional language model. By transforming relational triples into masked sentences, we can use this model to rank a triple's validity by the estimated pointwise mutual information between the two entities. Since we do not update the weights of the bidirectional model, our approach is not biased by the coverage of any one commonsense knowledge base. Though this method performs worse on a test set than models explicitly trained on a corresponding training set, it outperforms these methods when mining commonsense knowledge from new sources, suggesting that unsupervised techniques may generalize better than current supervised approaches.

研究动机与目标

动机：提升图谱式知识库覆盖有限的常识知识需求。
开发一种无监督方法，利用预训练语言模型而无需在特定知识库上微调。
测试基于句子派生的表征和基于 PMI 的评分是否能区分有效三元组与无效三元组。
评估对新数据的泛化能力，包括从维基百科挖掘，并与有监督 CKBC 方法进行比较。

提出的方法

使用模板和语法变换将头-关系-尾三元组转换为候选句子。
使用预训练语言模型作为连贯性排序步骤，选择最连贯的句子。
通过在关系条件下对头与尾估计加权 PMI，使用带掩蔽的双向语言模型来对三元组进行评分。
通过对尾部多词时进行掩码并贪心地解掩，计算 p(t|h,r) 和 p(t|r)，然后在两个方向上对 PMI 求平均。
使用超参 lambda 对 PMI 加权，并将 PMI(h,t|r) 与 PMI(t,h|r) 进行平均以降低方差。
与无监督基线（Concatenation、Template、Template+Grammar）以及有监督 CKBC 模型在 ConceptNet 驱动任务上进行比较。

实验结果

研究问题

RQ1在不对常识性知识库进行微调的前提下，固定的预训练语言模型能否对头-关系-尾三元组的有效性打分？
RQ2基于句子生成再结合 PMI 的评分是否能泛化用于从维基百科等来源挖掘新的常识知识？
RQ3连贯性排序相较于基于模板的句子构建在 CKBC 中的表现如何？
RQ4语法性和语义保真度对 CKBC 与维基百科挖掘任务性能有何影响？
RQ5无监督方法在标准基准上能接近有监督 CKBC 方法到何种程度？

主要发现

模型	任务1 CKBC 的 F1	任务2 质量（满分4分）
无监督	-	-
拼接	68.8	2.95±0.11
模板	72.2	2.98±0.11
模板+语法	74.4	2.56±0.13
一致性排序	78.8	3.00±0.12
有监督	-	-
DNN	89.2	2.50
Factorized	89.0	2.61
Prototypical	79.4	2.55

无监督的连贯性排序在任务1上达到 F1 78.8，接近有监督的 Prototypical 79.4 在 ConceptNet 风格 CKBC 上的水平。
在任务2（维基百科挖掘）中，连贯性排序达到平均质量分数 3.00（λ=4），在其设置中超越了标准有监督方法。
模板基方法通常不及连贯性排序方法，而简单的拼接和模板方法则落后。
有监督模型（DNN、Factorized、Prototypical）在 ConceptNet 测试集上仍实现更高的 CKBC F1 分数（89.0–89.2），超过无监督方法。
该方法在对未见数据（维基百科）展现出强泛化能力，尽管在没有 ConceptNet 的数据上训练，表明在挖掘超越现有知识库方面具有潜力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。