QUICK REVIEW

[论文解读] How Additional Knowledge can Improve Natural Language Commonsense Question Answering?

Arindam Mitra, Pratyay Banerjee|arXiv (Cornell University)|Sep 19, 2019

Topic Modeling参考文献 29被引用 32

一句话总结

本论文研究如何将外部常识知识融入 BERT/RoBERTa 以改善多项选择题问答，提出四种知识注入模型和三种知识策略，并在 aNLI、PIQA、SocialIQA 以及一个合成 PFQA 数据集上显示出改进结果。

ABSTRACT

Recently several datasets have been proposed to encourage research in Question Answering domains where commonsense knowledge is expected to play an important role. Recent language models such as ROBERTA, BERT and GPT that have been pre-trained on Wikipedia articles and books have shown reasonable performance with little fine-tuning on several such Multiple Choice Question-Answering (MCQ) datasets. Our goal in this work is to develop methods to incorporate additional (commonsense) knowledge into language model-based approaches for better question-answering in such domains. In this work, we first categorize external knowledge sources, and show performance does improve on using such sources. We then explore three different strategies for knowledge incorporation and four different models for question-answering using external commonsense knowledge. We analyze our predictions to explore the scope of further improvements.

研究动机与目标

Motivate the use of external commonsense knowledge to boost QA beyond pretraining alone.
Categorize external knowledge sources by derivation and relevance for Commonsense QA tasks.
Propose and compare four knowledge-infusion models within a BERT/RoBERTa framework.
Evaluate knowledge infusion on multiple datasets (aNLI, PIQA, SocialIQA) and a synthetic PFQA dataset.

提出的方法

Categorize knowledge sources as Directly Derived, Partially Derived, and Relevant for evaluation.
Index and retrieve knowledge using Elasticsearch with re-ranking via Information Gain and Spacy similarity.
Fine-tune BERT/RoBERTa under three strategies: Revision (KB-only pretraining), Open-Book (per-example KB subset), and both (Revision + Open-Book).
Introduce four models to fuse knowledge: Concat, Parallel-Max, Simple Sum, and Weighted Sum.
Implement four knowledge-fusion variants under Open-Book (and two variants for Weighted Sum) to produce answer scores.
Create and evaluate a synthetic PFQA dataset to test memorization and multi-hop reasoning across knowledge sentences.

实验结果

研究问题

RQ1Does incorporating external knowledge improve MCQ QA performance on commonsense datasets?
RQ2Which knowledge source categories (Directly Derived, Partially Derived, Relevant) are most beneficial for QA performance?
RQ3Which of the four knowledge-fusion models best leverages retrieved knowledge across datasets?
RQ4How do the Revision, Open-Book, and combined strategies compare in effectiveness across tasks?

主要发现

Knowledge infusion improves performance across datasets; Open-Book and Revision strategies are both beneficial, with combined strategies often yielding the best results.
Weighted Sum is the strongest knowledge-fusion model overall, enabling flexible weighting of multiple knowledge passages.
PIQA and aNLI benefit from larger knowledge sets, whereas too much knowledge can hurt aNLI due to noise or misalignment.
RoBERTa gains more from knowledge in some cases, while BERT shows improvements but can be distracted by retrieved knowledge in some settings.
SocialIQA and PFQA show benefits but remain short of human accuracy, highlighting limits of current external-knowledge approaches.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。