QUICK REVIEW

[论文解读] Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

Zhi-Xiu Ye, Chen Qian|arXiv (Cornell University)|Aug 19, 2019

Topic Modeling参考文献 38被引用 70

一句话总结

本文提出基于 AMS 的预训练，将 ConceptNet 的常识知识注入到 BERT 中，创建 MCQA 风格的预训练数据，并在不损害通用 NLP 性能的情况下提升常识基准。

ABSTRACT

The state-of-the-art pre-trained language representation models, such as Bidirectional Encoder Representations from Transformers (BERT), rarely incorporate commonsense knowledge or other knowledge explicitly. We propose a pre-training approach for incorporating commonsense knowledge into language representation models. We construct a commonsense-related multi-choice question answering dataset for pre-training a neural language representation model. The dataset is created automatically by our proposed "align, mask, and select" (AMS) method. We also investigate different pre-training tasks. Experimental results demonstrate that pre-training models using the proposed approach followed by fine-tuning achieve significant improvements over previous state-of-the-art models on two commonsense-related benchmarks, including CommonsenseQA and Winograd Schema Challenge. We also observe that fine-tuned models after the proposed pre-training approach maintain comparable performance on other NLP tasks, such as sentence classification and natural language inference tasks, compared to the original BERT models. These results verify that the proposed approach, while significantly improving commonsense-related NLP tasks, does not degrade the general language representation capabilities.

研究动机与目标

推动在不牺牲通用语言理解能力的前提下，将常识知识引入预训练语言模型。
提出 AMS 自动构建一个与常识知识图对齐的大规模自然语言问答数据集。
在 AMS 数据上对 BERT 的变体进行预训练，并在常识基线和 GLUE 任务上进行评估。
展示消融实验以理解数据创建和预训练任务对性能的影响。

提出的方法

筛选并将 ConceptNet 三元组与英文维基百科句子对齐。
在句子中遮蔽一个概念，形成 MCQA 问题，并将遮蔽的概念作为正确答案。
通过寻找具有相同关系或概念的相关三元组来选择四个干扰项，使选项具有混淆性。
使用 AMS 数据集对 MCQA 任务进行预训练 BERT_CS 模型，在候选答案上使用 softmax。
在下游任务上进行微调，并与基线 BERT 与最先进结果进行比较。
提供消融实验，将 MCQA 与 MLM 预训练及不同数据创建策略进行比较。

实验结果

研究问题

RQ1基于 AMS 的预训练是否能提升 CSQA 和 WSC 上的常识推理能力？
RQ2将 AMS 数据加入是否会降低或保持 GLUE 风格通用 NLP 任务的性能？
RQ3哪种预训练任务和数据创建策略最有利于语言模型的常识推理？

主要发现

模型	CSQA 测试准确率 (%)
BERT base	53.0
BERT large	56.7
CoS-E (Rajani et al., 2019)	58.2
BERT_CS base	56.2
BERT_CS large	62.2

BERT_CS large 在 CSQA 测试上取得 62.2% 的成绩，超过基线 BERT large（56.7%）和 CoS-E SOTA（58.2%）。
BERT_CS 模型在 GLUE 任务上的表现与原始 BERT 模型相当，表明未降低通用语言表示能力。
MCQA-based pre-training with AMS outperforms MLM-based or random distractor approaches in CSQA ablations.
Ablation shows that natural-language sentence inputs for pre-training are preferable to purely triple-based inputs for CSQA.
On WSC, BERT_CS large + MCQA achieves superior results across multiple evaluation facets, suggesting MCQA formatting benefits for commonsense tasks.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。