[Paper Review] Direct Uncertainty Prediction for Medical Second Opinions
The paper shows that Direct Uncertainty Prediction (DUP) learns uncertainty scores directly from patient features and better identifies cases likely to cause doctor disagreements than the traditional two-step Uncertainty Via Classification (UVC) approach, demonstrated on large-scale medical imaging data and theoretical results.
The issue of disagreements amongst human experts is a ubiquitous one in both machine learning and medicine. In medicine, this often corresponds to doctor disagreements on a patient diagnosis. In this work, we show that machine learning models can be trained to give uncertainty scores to data instances that might result in high expert disagreements. In particular, they can identify patient cases that would benefit most from a medical second opinion. Our central methodological finding is that Direct Uncertainty Prediction (DUP), training a model to predict an uncertainty score directly from the raw patient features, works better than Uncertainty Via Classification, the two-step process of training a classifier and postprocessing the output distribution to give an uncertainty score. We show this both with a theoretical result, and on extensive evaluations on a large scale medical imaging application.
Motivation & Objective
- Motivate and formalize the medical second opinion problem where doctor disagreements occur.
- Define and compare Direct Uncertainty Prediction (DUP) vs Uncertainty Via Classification (UVC).
- Provide theoretical guarantees showing unbiasedness of DUP and bias in UVC under a natural model.
- Empirically validate DUP vs UVC on large-scale retinal fundus imaging data and adjudicated gold standard sets.
Proposed method
- Formalize uncertainty scoring functions U on empirical doctor grade histograms.
- Develop DUP to learn h_dup(x) directly from raw patient features to estimate U(E[Y|O]).
- Contrast with UVC which first learns a classifier to produce E[Y|g(O)=x] and then applies U.
- Prove that h_dup is an unbiased estimator of U( E[Y|O] ) while h_uvc has a bias term under the model.
- Demonstrate with toy Gaussian mixture experiments and with large-scale medical imaging data (DR) and adjudicated test sets.
Experimental results
Research questions
- RQ1Can direct learning of uncertainty from patient features yield unbiased estimates of doctor disagreement compared to two-step methods?
- RQ2Under what conditions does Direct Uncertainty Prediction (DUP) outperform Uncertainty Via Classification (UVC)?
- RQ3Do DUP models better identify cases requiring medical second opinions in retinal imaging data?
- RQ4How do DUP and UVC perform on adjudicated gold-standard disagreement tasks?
Key findings
- DUP provides unbiased estimates of the target uncertainty, while UVC incurs a bias term under the proposed model.
- In toy experiments (mixtures of Gaussians) and image blur experiments (SVHN/CIFAR-10), DUP better identifies data points with high disagreement.
- On retinal fundus DR grading data, DUP consistently outperforms UVC across multiple uncertainty definitions and evaluation tasks.
- In adjudicated evaluations, DUPs outperform baselines and show stronger alignment with consensus/disagreement signals.
- DUP-based rankings correlate more strongly with adjudicated disagreement than UVC-based rankings across several distance metrics.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.