QUICK REVIEW

[论文解读] Robust Distant Supervision Relation Extraction via Deep Reinforcement Learning

Pengda Qin, Weiran Xu|arXiv (Cornell University)|May 24, 2018

Topic Modeling参考文献 20被引用 45

一句话总结

本文提出一种基于策略的深度强化学习框架，用于在远程监督的关系抽取中识别并重新分配假阳性样本，在不改变底层关系分类器的前提下提升模型在不同模型上的鲁棒性。

ABSTRACT

Distant supervision has become the standard method for relation extraction. However, even though it is an efficient method, it does not come at no cost---The resulted distantly-supervised training samples are often very noisy. To combat the noise, most of the recent state-of-the-art approaches focus on selecting one-best sentence or calculating soft attention weights over the set of the sentences of one specific entity pair. However, these methods are suboptimal, and the false positive problem is still a key stumbling bottleneck for the performance. We argue that those incorrectly-labeled candidate sentences must be treated with a hard decision, rather than being dealt with soft attention weights. To do this, our paper describes a radical solution---We explore a deep reinforcement learning strategy to generate the false-positive indicator, where we automatically recognize false positives for each relation type without any supervised information. Unlike the removal operation in the previous studies, we redistribute them into the negative examples. The experimental results show that the proposed strategy significantly improves the performance of distant supervision comparing to state-of-the-art systems.

研究动机与目标

激发并解决远程监督关系抽取中的噪声问题。
开发一种与模型无关的基于RL的方法，在无需人工标注的情况下识别假阳性。
证明重新分配假阳性能提升现有神经关系抽取器的性能。
展示在 NYT-Freebase 数据集上对多种基线的鲁棒性。

提出的方法

通过整合当前句子和前句的信息，将远程监督形式化为一个MDP。
使用基于CNN的策略网络，为每个关系类型的每个远程监督句子决定删除还是保留。
使用高度不平衡的DS正/负分布，通过类似监督的步骤对策略网络进行预训练。
训练RL智能体在每个时期删除固定数量的句子，并将它们重新分配到负集，奖励基于验证集上F1的提升来衡量。
将奖励R_i定义为alpha乘以相邻时期F1的差值，在最后五个时期上求平均以稳定训练。
将删除的样本重新分配到负集，并重新训练关系分类器以评估基于奖励的性能。

实验结果

研究问题

RQ1一个基于策略的RL智能体是否能够在没有人工标注的情况下，可靠地识别远程监督数据中的假阳性句子？
RQ2将假阳性重新分配到负集是否能提升NYT-Freebase上现有关系抽取模型的性能？
RQ3所提出的RL框架是否与模型无关并且可与不同的神经关系抽取器兼容？
RQ4预训练和基于奖励的再训练对分类器性能有何影响？

主要发现

基于RL的假阳性指示器相较于仅使用原始策略或预训练策略，在关系分类器的F1分数上有提升。
对策略网络的预训练带来显著提升，RL再训练在若干关系类型上带来进一步收益。
强化的RL训练在基于CNN和PCNN的模型上均提高了PR曲线下面积(AUC)，并且具有统计显著性提升（给出p值）。
该方法在与现有模型结合使用时显示出提升性能，表明其作为即插即用组件的模型无关适用性。
示例说明了假阳性的检测以及删除样本在不同关系中的分布，与观测到的数据集噪声特征一致。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。