QUICK REVIEW

[论文解读] Robust Lexical Features for Improved Neural Network Named-Entity Recognition

Abbas Ghaddar, Philippe Langlais|arXiv (Cornell University)|Jun 9, 2018

Topic Modeling被引用 46

一句话总结

本文提出通过 WiFiNE 在维基百科上离线学习的 LS（Lexical Similarity）向量，显示词汇特征可以显著提升 Bi-LSTM-CRF NER 的性能，在 OntoNotes 5.0 上达到 state-of-the-art，在 CoNLL-2003 上也具备竞争力。

ABSTRACT

Neural network approaches to Named-Entity Recognition reduce the need for carefully hand-crafted features. While some features do remain in state-of-the-art systems, lexical features have been mostly discarded, with the exception of gazetteers. In this work, we show that this is unfair: lexical features are actually quite useful. We propose to embed words and entity types into a low-dimensional vector space we train from annotated data produced by distant supervision thanks to Wikipedia. From this, we compute - offline - a feature vector representing each word. When used with a vanilla recurrent neural network model, this representation yields substantial improvements. We establish a new state-of-the-art F1 score of 87.95 on ONTONOTES 5.0, while matching state-of-the-art performance with a F1 score of 91.73 on the over-studied CONLL-2003 dataset.

研究动机与目标

在神经 NER 系统中发挥词汇信息的作用，超越传统地标表的动机。
提出一个离线学习的每个单词 120 维的 Lexical Similarity (LS) 特征向量，通过将单词与 120 种实体类型嵌入到一个共同空间（从 Wikipedia 注释 WiFiNE 学得）来实现。
将 LS 特征整合到 Bi-LSTM-CRF NER 模型，并在标准基准数据集（CoNLL-2003 和 OntoNotes 5.0）上进行评估。
相对于预训练词嵌入和字符/大小写特征，评估 LS 特征的鲁棒性与互补价值。

提出的方法

使用 WiFiNE 标注的维基百科数据（120 种实体类型）创建单词/实体类型的联合嵌入空间。
为每个单词计算一个 120 维的 LS 向量，其中每一维是单词嵌入与一个实体类型嵌入之间的余弦相似度。
在模型使用前将 LS 向量通过 MinMax 归一化到 [-1,1] 区间。
将 LS 特征与 Bi-LSTM-CRF NER 模型并用标准特征（词嵌入、字符级编码、大小写特征）共同使用。
用 SGD（动量 0.9）训练词和字符组件并应用 dropout；在开发数据上采用早停。

实验结果

研究问题

RQ1离线学习的 LS 词汇表示是否为 NER 提供了对标准嵌入的互补信息？
RQ2LS 的表现如何与传统地标表特征以及在 Bi-LSTM-CRF NER 模型中的上下文感知嵌入相比？
RQ3LS 特征对 CoNLL-2003 和 OntoNotes 5.0 的性能影响，尤其对低频单词？
RQ4当单词在 Wikipedia 派生数据中注释稀少或嘈杂时，LS 特征是否鲁棒？

主要发现

在一个简单的 Bi-LSTM-CRF 的 NER 上加入 LS 向量可获得显著提升。
在 OntoNotes 5.0 上，所提出的系统达到新的 state-of-the-art F1 值 87.95。
在 CoNLL-2003 上，系统达到与最先进水平相同的 F1 值 91.73。
LS 表征优于二元地标表特征，并为标准嵌入提供互补信息。
消融分析显示 LS 与 Sskip 嵌入具有竞争力且互补，LS+Sskip 结合给出最佳结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。