QUICK REVIEW

[論文レビュー] Terminologies augmented recurrent neural network model for clinical named entity recognition

Ivan Lerner, Nicolás Paris|arXiv (Cornell University)|Apr 25, 2019

Biomedical Text Mining and Ontologies参考文献 29被引用数 37

ひとこと要約

本論文は、用語ベースのベースライン、biGRU-CRFモデル、および臨床NERの英語(i2b2-2009)とフランス語(APcNER)で用語予測を biGRU-CRF の特徴として利用するハイブリッドシステムを評価し、ハイブリッドシステムがコーパス全体で最良の exact-match F1 スコアを達成する。

ABSTRACT

We aimed to enhance the performance of a supervised model for clinical named-entity recognition (NER) using medical terminologies. In order to evaluate our system in French, we built a corpus for 5 types of clinical entities. We used a terminology-based system as baseline, built upon UMLS and SNOMED. Then, we evaluated a biGRU-CRF, and an hybrid system using the prediction of the terminology-based system as feature for the biGRU-CRF. In English, we evaluated the NER systems on the i2b2-2009 Medication Challenge for Drug name recognition, which contained 8,573 entities for 268 documents. In French, we built APcNER, a corpus of 147 documents annotated for 5 entities (drug name, sign or symptom, disease or disorder, diagnostic procedure or lab test and therapeutic procedure). We evaluated each NER systems using exact and partial match definition of F-measure for NER. The APcNER contains 4,837 entities which took 28 hours to annotate, the inter-annotator agreement was acceptable for Drug name in exact match (85%) and acceptable for other entity types in non-exact match (>70%). For drug name recognition on both i2b2-2009 and APcNER, the biGRU-CRF performed better that the terminology-based system, with an exact-match F-measure of 91.1% versus 73% and 81.9% versus 75% respectively. Moreover, the hybrid system outperformed the biGRU-CRF, with an exact-match F-measure of 92.2% versus 91.1% (i2b2-2009) and 88.4% versus 81.9% (APcNER). On APcNER corpus, the micro-average F-measure of the hybrid system on the 5 entities was 69.5% in exact match, and 84.1% in non-exact match. APcNER is a French corpus for clinical-NER of five type of entities which covers a large variety of document types. Extending supervised model with terminology allowed for an easy performance gain, especially in low regimes of entities, and established near state of the art results on the i2b2-2009 corpus.

研究の動機と目的

UMLSおよびSNOMEDからの医療用語を用いて監視付き臨床NERの性能を向上させる。
英語の(i2b2-2009 Medication Challenge)とフランス語のAPcNERコーパスでNERシステムを評価する。
用語ベースのベースライン、biGRU-CRF、および用語由来特徴を用いたハイブリッド手法を比較する。

提案手法

UMLSおよびSNOMEDを用いて用語ベースのベースラインを構築する。
臨床データ上でNERのための biGRU-CRF を訓練する。
用語ベース系の予測を biGRU-CRF の特徴として利用するハイブリッドシステムを開発する。
正確一致と部分一致（非正確）マッチの F 指標を用いて評価する。
APcNERフランス語コーパスを五つのエンティティタイプ（薬剤名、徴候/症状、疾病/障害、診断手技または検査、治療手技）でアノテーションする。

実験結果

リサーチクエスチョン

RQ1用語ベースのベースラインはニューラルモデルと比べて臨床NERでどの程度の性能を示すか。
RQ2用語ベースの予測を特徴として組み込むことでニューラルNERの性能は向上するか。
RQ3英語の i2b2-2009 およびフランス語の APcNER データセットにおける exact および non-exact マッチでのNER性能はどの程度か。

主な発見

BiGRU-CRFは、両コーパスで薬剤名認識において用語ベースのベースラインを上回る（i2b2-2009: 91.1% vs 73%; APcNER: 81.9% vs 75%）。
ハイブリッドシステムは biGRU-CRF のベースラインを上回る（i2b2-2009: 92.2% vs 91.1%; APcNER: 88.4% vs 81.9%）。
APcNER コーパスでは、ハイブリッドシステムが5エンティティについて micro-average exact-match F-measure が 69.5%、non-exact F-measure が 84.1% を達成。
APcNER コーパスは 147 文書にわたって 4,837 のエンティティがアノテーションされたフランス語の臨床-NERデータセットで、インターアノテータ信頼性は許容範囲を示し（Drug name の exact match 85%、他のエンティティは non-exact で >70%）。
監視モデルに用語を組み込むことにより性能が向上し、i2b2-2009でほぼ最先端の結果を達成。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。