QUICK REVIEW

[论文解读] CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

Akshay Smit, Saahil Jain|arXiv (Cornell University)|Apr 20, 2020

Radiomics and Machine Learning in Medical Imaging被引用 74

一句话总结

CheXbert 在基于规则的标签器输出和小规模专家注释，加上反译数据增强的基础上，对生物医学领域预训练的 BERT 进行微调，在 MIMIC-CXR 测试数据上实现了最先进的放射科报告标注，并接近放射科医生的性能。

ABSTRACT

The extraction of labels from radiology text reports enables large-scale training of medical imaging models. Existing approaches to report labeling typically rely either on sophisticated feature engineering based on medical domain knowledge or manual annotations by experts. In this work, we introduce a BERT-based approach to medical image report labeling that exploits both the scale of available rule-based systems and the quality of expert annotations. We demonstrate superior performance of a biomedically pretrained BERT model first trained on annotations of a rule-based labeler and then finetuned on a small set of expert annotations augmented with automated backtranslation. We find that our final model, CheXbert, is able to outperform the previous best rules-based labeler with statistical significance, setting a new SOTA for report labeling on one of the largest datasets of chest x-rays.

研究动机与目标

推动可扩展的放射科报告标注，以实现大规模医学影像模型训练。
利用现有的基于规则的标签器来引导一个生物医学语言模型的引导与预训练。
在有限的专家注释的基础上，结合反向翻译进行增强以提高标注准确性。
证明将基于规则的输出与专家标签相结合，可在多数观察项中实现优于以往标签器的性能。

提出的方法

使用带有 14 个任务特定线性头的 BERT-base 架构来处理 14 个观察项。
从在基于规则的标签器输出上训练的生物医学预训练 BERT 模型初始化，然后在专家注释上进行微调。
用自动化的反向翻译来扩增一个小型的专家标注语料，以丰富表达。
使用加权 F1 指标在 14 个观察项和三个检索任务（positive、negative、uncertain）上进行评估。
在 CheXpert 和 MIMIC-CXR 数据集上与 CheXpert 基线和放射科医生基准进行比较。

实验结果

研究问题

RQ1一个从基于规则的标签输出初始化的生物医学预训练 BERT 模型，是否能超越纯专家标签训练和纯自动标注？
RQ2反向翻译增强是否能提升放射科报告的标注表现？
RQ3CheXbert在大型胸部X光数据集上能接近放射科医生级别的标注性能到多近？
RQ4在此任务中使用生物医学预训练表示相对于通用预训练表示的相对性能提升有多大？

主要发现

CheXbert (Tblue-hybrid-bt) achieves F1 = 0.798 (95% CI 0.775, 0.816), outperforming CheXpert (0.743) with a significant difference (p<0.001).
CheXbert is 0.007 F1 points below radiologist benchmark (0.805, 0.784–0.823).
Backtranslation augmentation improves performance over non-augmented variants (e.g., T-blue-hybrid-bt vs T-blue-rad).
On per-condition analysis, CheXbert yields largest gains for Pneumonia (0.151), Fracture (0.120), Consolidation (0.105), Enlarged Cardiomediastinum (0.100), and No Finding (0.097).
CheXbert outperforms models trained only on radiologist labels or only on automatic labeler outputs across most observations.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。