Skip to main content
QUICK REVIEW

[论文解读] Disentangling Human Error from the Ground Truth in Segmentation of Medical Images

Le Zhang, Ryutaro Tanno|arXiv (Cornell University)|Jul 31, 2020
Advanced Neural Network Applications参考文献 45被引用 76
一句话总结

本文提出一个端到端的卷积神经网络框架,能够联合学习真实分割标签和每个标注者的像素级混淆矩阵,基于嘈杂的多类别医学影像标签,从而在标注稀缺或高度不一致时显著提升分割准确性。

ABSTRACT

Recent years have seen increasing use of supervised learning methods for segmentation tasks. However, the predictive performance of these algorithms depends on the quality of labels. This problem is particularly pertinent in the medical image domain, where both the annotation cost and inter-observer variability are high. In a typical label acquisition process, different human experts provide their estimates of the "true" segmentation labels under the influence of their own biases and competence levels. Treating these noisy labels blindly as the ground truth limits the performance that automatic segmentation algorithms can achieve. In this work, we present a method for jointly learning, from purely noisy observations alone, the reliability of individual annotators and the true segmentation label distributions, using two coupled CNNs. The separation of the two is achieved by encouraging the estimated annotators to be maximally unreliable while achieving high fidelity with the noisy training data. We first define a toy segmentation dataset based on MNIST and study the properties of the proposed algorithm. We then demonstrate the utility of the method on three public medical imaging segmentation datasets with simulated (when necessary) and real diverse annotations: 1) MSLSC (multiple-sclerosis lesions); 2) BraTS (brain tumours); 3) LIDC-IDRI (lung abnormalities). In all cases, our method outperforms competing methods and relevant baselines particularly in cases where the number of annotations is small and the amount of disagreement is large. The experiments also show strong ability to capture the complex spatial characteristics of annotators' mistakes.

研究动机与目标

  • 在医学影像中面对高互观者变异性时,推动鲁棒分割。
  • 提出一种两网络架构,以从标注者行为中解耦真实标签。
  • 使模型能够仅从嘈杂标注中学习,无需真值标签。

提出的方法

  • 两个耦合的CNN:一个分割网络估计 p(y|x),一个标注者网络估计每个标注者的像素级混淆矩阵 A^{(r)}(x)。
  • 预测的标注者分布:p̂^{(r)}(x) = Â^{(r)}(x) · p̂θ(x)。
  • 训练优化观察到的嘈杂标签与标注者预测之间的交叉熵损失之和,以及对 Â^{(r)}(x) 的迹正则化项,以促进将噪声与真实标签分离。
  • 损失:L_total = 对所有图像和标注者求和 CE(Â^{(r)}(x)·p̂θ(x), ỹ^{(r)}) + λ·tr(Â^{(r)}(x))。
  • 包含一个热启动阶段,将标注者混淆矩阵初始化为对角占优(单位矩阵),以促进合理分离。
  • 可选的低秩(秩-1)混淆矩阵近似,以降低多类别时的计算。

实验结果

研究问题

  • RQ1模型能否仅从嘈杂、多标注者标签中学习到真实的分割分布?
  • RQ2共同学习标注者行为与真实标签是否提高分割性能,尤其是在每张图像注释很少时?
  • RQ3图像依赖的像素级混淆矩阵在不同医学影像数据集上能多好地捕捉标注者的错误模式?
  • RQ4迹正则化是否在具有挑战性、样本特定的场景中实现对真实类别的唯一恢复?
  • RQ5在合成与真实数据集上,与标签融合基线(STAPLE、Spatial STAPLE)及 Probabilistic U-net 的对比如何?

主要发现

  • 本文方���(本方案)在基于 MNIST 的密集分割上达到 Dice 82.92%,在 MSLesion 的密集分割上达到 67.55%,优于 STAPLE 与 Spatial STAPLE 基线。
  • 标注者混淆矩阵估计误差(均方误差)用本方案显著更低(如 MNIST 0.0893,MSLesion 0.0811),与基线相比。
  • 在每张图像单标签设置中,本方案仍优于基线,Dice 达 56.43%(表中显示的 MNIST/MS 情况),表明在标注稀缺时的鲁棒性。
  • 在 BraTS 与 LIDC-IDRI 上,本方案在密集和单标签情景下均优于 STAPLE 变体,并在混淆矩阵估计方面有显著提升(例如 BraTS 提升 14.4%)。
  • 通用化能量距离 GED 比较在 MNIST、MS、BraTS、LIDC-IDRI 数据集上支持本方案优于 Probabilistic U-Net(例如 MNIST:1.24 vs 1.46;MS:1.67 vs 1.91;BraTS:3.14 vs 3.23;LIDC-IDRI:1.87 vs 1.97)。
  • 在各数据集上,图像相关的像素级混淆矩阵比全局混淆矩阵或逐图基线更好地捕捉标注者之间的变异性,且在分割精度和混淆矩阵的保真度方面具有一致的提升。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。