QUICK REVIEW

[论文解读] Direct Uncertainty Prediction for Medical Second Opinions

Maithra Raghu, Katy Blumer|arXiv (Cornell University)|Jul 4, 2018

Machine Learning in Healthcare被引用 49

一句话总结

本文表明 Direct Uncertainty Prediction (DUP) 能直接从患者特征学习不确定性分数，并在识别可能引起医生意见分歧的案例方面优于传统的两步方法 Uncertainty Via Classification (UVC)，该结论在大规模医学影像数据和理论结果上得到验证。

ABSTRACT

The issue of disagreements amongst human experts is a ubiquitous one in both machine learning and medicine. In medicine, this often corresponds to doctor disagreements on a patient diagnosis. In this work, we show that machine learning models can be trained to give uncertainty scores to data instances that might result in high expert disagreements. In particular, they can identify patient cases that would benefit most from a medical second opinion. Our central methodological finding is that Direct Uncertainty Prediction (DUP), training a model to predict an uncertainty score directly from the raw patient features, works better than Uncertainty Via Classification, the two-step process of training a classifier and postprocessing the output distribution to give an uncertainty score. We show this both with a theoretical result, and on extensive evaluations on a large scale medical imaging application.

研究动机与目标

Motivate and formalize the medical second opinion problem where doctor disagreements occur.
Define and compare Direct Uncertainty Prediction (DUP) vs Uncertainty Via Classification (UVC).
Provide theoretical guarantees showing unbiasedness of DUP and bias in UVC under a natural model.
Empirically validate DUP vs UVC on large-scale retinal fundus imaging data and adjudicated gold standard sets.

提出的方法

Formalize uncertainty scoring functions U on empirical doctor grade histograms.
Develop DUP to learn h_dup(x) directly from raw patient features to estimate U(E[Y|O]).
Contrast with UVC which first learns a classifier to produce E[Y|g(O)=x] and then applies U.
Prove that h_dup is an unbiased estimator of U( E[Y|O] ) while h_uvc has a bias term under the model.
Demonstrate with toy Gaussian mixture experiments and with large-scale medical imaging data (DR) and adjudicated test sets.

实验结果

研究问题

RQ1Can direct learning of uncertainty from patient features yield unbiased estimates of doctor disagreement compared to two-step methods?
RQ2Under what conditions does Direct Uncertainty Prediction (DUP) outperform Uncertainty Via Classification (UVC)?
RQ3Do DUP models better identify cases requiring medical second opinions in retinal imaging data?
RQ4How do DUP and UVC perform on adjudicated gold-standard disagreement tasks?

主要发现

DUP provides unbiased estimates of the target uncertainty, while UVC incurs a bias term under the proposed model.
In toy experiments (mixtures of Gaussians) and image blur experiments (SVHN/CIFAR-10), DUP better identifies data points with high disagreement.
On retinal fundus DR grading data, DUP consistently outperforms UVC across multiple uncertainty definitions and evaluation tasks.
In adjudicated evaluations, DUPs outperform baselines and show stronger alignment with consensus/disagreement signals.
DUP-based rankings correlate more strongly with adjudicated disagreement than UVC-based rankings across several distance metrics.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。