Skip to main content
QUICK REVIEW

[论文解读] Terminologies augmented recurrent neural network model for clinical named entity recognition

Ivan Lerner, Nicolás Paris|arXiv (Cornell University)|Apr 25, 2019
Biomedical Text Mining and Ontologies参考文献 29被引用 37
一句话总结

本文评估了基于术语的基线、biGRU-CRF 模型,以及将术语预测作为特征用于 biGRU-CRF 的混合系统,用于英语(i2b2-2009)和法语(APcNER)的临床 NER,结果显示混合系统在各语料库上实现了最优的精确匹配 F1 分数。

ABSTRACT

We aimed to enhance the performance of a supervised model for clinical named-entity recognition (NER) using medical terminologies. In order to evaluate our system in French, we built a corpus for 5 types of clinical entities. We used a terminology-based system as baseline, built upon UMLS and SNOMED. Then, we evaluated a biGRU-CRF, and an hybrid system using the prediction of the terminology-based system as feature for the biGRU-CRF. In English, we evaluated the NER systems on the i2b2-2009 Medication Challenge for Drug name recognition, which contained 8,573 entities for 268 documents. In French, we built APcNER, a corpus of 147 documents annotated for 5 entities (drug name, sign or symptom, disease or disorder, diagnostic procedure or lab test and therapeutic procedure). We evaluated each NER systems using exact and partial match definition of F-measure for NER. The APcNER contains 4,837 entities which took 28 hours to annotate, the inter-annotator agreement was acceptable for Drug name in exact match (85%) and acceptable for other entity types in non-exact match (>70%). For drug name recognition on both i2b2-2009 and APcNER, the biGRU-CRF performed better that the terminology-based system, with an exact-match F-measure of 91.1% versus 73% and 81.9% versus 75% respectively. Moreover, the hybrid system outperformed the biGRU-CRF, with an exact-match F-measure of 92.2% versus 91.1% (i2b2-2009) and 88.4% versus 81.9% (APcNER). On APcNER corpus, the micro-average F-measure of the hybrid system on the 5 entities was 69.5% in exact match, and 84.1% in non-exact match. APcNER is a French corpus for clinical-NER of five type of entities which covers a large variety of document types. Extending supervised model with terminology allowed for an easy performance gain, especially in low regimes of entities, and established near state of the art results on the i2b2-2009 corpus.

研究动机与目标

  • 用 UMLS 和 SNOMED 的医学术语提升监督式临床 NER 性能。
  • 在英语(i2b2-2009 Medication Challenge)和法语 APcNER 语料上评估 NER 系统。
  • 比较基线的术语基模型、biGRU-CRF,以及使用术语派生特征的混合方法。

提出的方法

  • 使用 UMLS 与 SNOMED 构建基于术语的基线。
  • 训练一个用于 NER 的 biGRU-CRF。
  • 开发一个混合系统,将术语基线的预测作为特征输入到 biGRU-CRF。
  • 使用精确和部分匹配(非精确)F-measure 进行评估。
  • 将 APcNER 法语语料标注为五种实体类型(药物名、体征/症状、疾病/疾病、诊断过程或实验室检查、治疗性过程)。

实验结果

研究问题

  • RQ1基于术语的基线在临床 NER 上的表现是否优于神经模型?
  • RQ2将术语基线的预测作为特征是否能提升神经 NER 的性能?
  • RQ3在英语 i2b2-2009 和法语 APcNER 数据集上,精确匹配和非精确匹配下的 NER 性能水平如何?

主要发现

  • BiGRU-CRF 在药物名识别方面优于两语料库中的术语基线(i2b2-2009: 91.1% vs 73%;APcNER: 81.9% vs 75%)。
  • 混合系统超越 biGRU-CRF 基线(i2b2-2009: 92.2% vs 91.1%;APcNER: 88.4% vs 81.9%)。
  • 在 APcNER 上,混合系统对五个实体的微平均精确匹配 F 测度为 69.5%,非精确 F 测度为 84.1%。
  • APcNER 语料库是一个法语临床-NER 数据集,包含 4,837 个实体,覆盖 147 篇文档,具有可接受的评注者一致性(药物名的精确匹配为 85%,其他实体的非精确>70%)。
  • 将监督模型扩展到术语可带来性能提升,在 i2b2-2009 上接近或达到最优水平。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。