QUICK REVIEW

[论文解读] Publicly Available Clinical BERT Embeddings

Emily Alsentzer, John R. Murphy|arXiv (Cornell University)|Apr 6, 2019

Topic Modeling参考文献 20被引用 720

一句话总结

简短结论：该论文对临床领域的 BERT 模型进行预训练并公开发布（Clinical BERT 和 Discharge Summary BERT），在 MIMIC 笔记上训练，展示在 MedNLI 和 i2b2 的 NER 任务上相较于通用 BERT 与 BioBERT 的改进，但在去识别（de-identification）任务上并无改进。

ABSTRACT

Contextual word embedding models such as ELMo (Peters et al., 2018) and BERT (Devlin et al., 2018) have dramatically improved performance for many natural language processing (NLP) tasks in recent months. However, these models have been minimally explored on specialty corpora, such as clinical text; moreover, in the clinical domain, no publicly-available pre-trained BERT models yet exist. In this work, we address this need by exploring and releasing BERT models for clinical text: one for generic clinical text and another for discharge summaries specifically. We demonstrate that using a domain-specific model yields performance improvements on three common clinical NLP tasks as compared to nonspecific embeddings. These domain-specific models are not as performant on two clinical de-identification tasks, and argue that this is a natural consequence of the differences between de-identified source text and synthetically non de-identified task text.

研究动机与目标

出于临床文本在语言风格上与一般文本和生物医学文本存在差异，激发对领域特定上下文嵌入的需求。
在 MIMIC 笔记上对 Clinical BERT 模型进行预训练并公开发布，包括一个出院摘要变体。
在标准临床 NLP 任务上评估临床 BERT 模型，以评估相较于通用 BERT 和 BioBERT 的提升。

提出的方法

在 MIMIC 临床文本上训练两种 BERT 变体：Clinical BERT（全部笔记）和 Discharge Summary BERT（出院摘要）。
在下游任务上微调预训练模型，在 BERT 输出之上加一个单线性分类器。
在 MedNLI 和四个 i2b2 NER 任务上评估，并额外进行两个去识别任务的评估，比较基线 BERT 和 BioBERT。
使用标准的 BERT 训练/设置，预训练的细节在附录中给出（例如序列长度、步骤）。
报告定量指标（准确率、精确 F1）以及定性嵌入分析（最近邻）。

实验结果

研究问题

RQ1临床训练的 BERT 模型在临床 NLP 任务上相较于通用域的 BERT 和 BioBERT 是否能提升性能？
RQ2笔记类型特定的训练（全部笔记 vs. 出院摘要）是否能带来任务特定的提升？
RQ3由于数据分布差异，临床 BERT 嵌入对非去识别任务是否有效，而对去识别任务效果较差？
RQ4在临床情境中，Clinical BERT 与 BioBERT 出现了哪些定性差异？

主要发现

模型	MedNLI（准确度）	i2b2 2006（精确 F1）	i2b2 2010（精确 F1）	i2b2 2012（精确 F1）	i2b2 2014（精确 F1）
BERT	77.6%	93.9	83.5	75.9	92.8
BioBERT	80.8%	94.8	86.5	78.9	93.0
Clinical BERT	80.8%	91.5	86.4	78.5	92.6
Discharge Summary BERT	80.6%	91.9	86.4	78.4	92.8
Bio+Clinical BERT	82.7%	94.7	87.2	78.9	92.5
Bio+Discharge Summary BERT	82.7%	94.8	87.8	78.9	92.7

Clinical BERT 在多个临床任务上优于通用 BERT 和 BioBERT，达成 MedNLI 的最新 SOTA（82.7% accuracy）。
在 i2b2 2010 和 2012 任务上，Clinical BERT 表现良好，但并不总是优于 BioBERT；Bio+Clinical BERT 往往取得更高结果。
Discharge Summary BERT 和笔记类型特定训练在某些任务上可提供额外提升。
Clinical BERT 在这两个去识别任务上没有改进，可能是因为去识别数据与基于 MIMIC 的训练语料之间存在领域漂移。
定性分析表明，Clinical BERT 构成的临床术语联想比 BioBERT 更具连贯性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。