QUICK REVIEW

[논문 리뷰] Disentangling Human Error from the Ground Truth in Segmentation of Medical Images

Le Zhang, Ryutaro Tanno|arXiv (Cornell University)|2020. 07. 31.

Advanced Neural Network Applications참고 문헌 45인용 수 76

한 줄 요약

본 논문은 노이즈가 있는 다중 주석 의료 이미지를 대상으로 진짜 분할 라벨과 주석가별 픽셀 단위 혼동 행렬을 함께 학습하는 end-to-end CNN 프레임워크를 제시하며, 특히 주석이 희박하거나 상이한 경우 분할 정확도를 향상시킨다.

ABSTRACT

Recent years have seen increasing use of supervised learning methods for segmentation tasks. However, the predictive performance of these algorithms depends on the quality of labels. This problem is particularly pertinent in the medical image domain, where both the annotation cost and inter-observer variability are high. In a typical label acquisition process, different human experts provide their estimates of the "true" segmentation labels under the influence of their own biases and competence levels. Treating these noisy labels blindly as the ground truth limits the performance that automatic segmentation algorithms can achieve. In this work, we present a method for jointly learning, from purely noisy observations alone, the reliability of individual annotators and the true segmentation label distributions, using two coupled CNNs. The separation of the two is achieved by encouraging the estimated annotators to be maximally unreliable while achieving high fidelity with the noisy training data. We first define a toy segmentation dataset based on MNIST and study the properties of the proposed algorithm. We then demonstrate the utility of the method on three public medical imaging segmentation datasets with simulated (when necessary) and real diverse annotations: 1) MSLSC (multiple-sclerosis lesions); 2) BraTS (brain tumours); 3) LIDC-IDRI (lung abnormalities). In all cases, our method outperforms competing methods and relevant baselines particularly in cases where the number of annotations is small and the amount of disagreement is large. The experiments also show strong ability to capture the complex spatial characteristics of annotators' mistakes.

연구 동기 및 목표

의료 영상에서 높은 관찰자 간 변동성 하에서 로버스트한 분할을 동기 부여한다.
진짜 라벨과 annotator 행동을 해리하기 위한 이중 네트워크 아키텍처를 제안한다.
Ground-truth 라벨이 없는 순수하게 노이즈 주석만으로도 학습을 가능하게 한다.

제안 방법

두 개의 결합된 CNN: 세분화 네트워크는 p(y|x)를 추정하고 주석가 네트워크는 annotator별 픽셀 단위 혼동 행렬 A^{(r)}(x)를 추정한다.
예상 주석 분포: p̂^{(r)}(x) = Â^{(r)}(x) · p̂θ(x).
학습은 관측된 노이즈 라벨과 주석가 예측 간의 교차 엔트로피 손실의 합과, 진짜 라벨에서 노이즈를 해리하도록 Â^{(r)}(x)에 대한 트레이스 규제 항을 최적화한다.
손실: L_total = sum over images and annotators of CE(Â^{(r)}(x)·p̂θ(x), ỹ^{(r)}) + λ·tr(Â^{(r)}(x)).
sensible disentangling을 유도하기 위해 주석가 CM이 대각선 우세(항등 행렬)로 초기화되는 워밍업(warm-start) 포함.
계산 비용을 줄이기 위한 다수 클래스의 경우 저순위(rank-1) CM 근사(Optional low-rank).

실험 결과

연구 질문

RQ1모델이 다 annotator 라벨만으로 ground-truth 분할 분포를 학습할 수 있는가?
RQ2주석가 행동과 진짜 라벨을 함께 학습하는 것이 주석이 적은 경우에도 분할 성능을 향상시키는가?
RQ3영상 의존적 픽셀 단위 혼동 행렬이 다양한 의료 영상 데이터셋에서 주석가의 오류 패턴을 얼마나 잘 포착하는가?
RQ4트레이스 규제가 도전적인 샘플별 설정에서 진정한 클래스를 고유하게 회복시키는가?
RQ5제안 방법이 라벨 융합 베이스라인(STAPLE, Spatial STAPLE)과 Probabilistic U-net과 어떤 차이가 있으며 합성 및 실제 데이터셋에서의 성능은 어떠한가?

주요 결과

제안된 접근법(Ours)은 MNIST 기반의 치밀한 분할에서 Dice 82.92%, MSLesion 치밀한 분할에서 67.55%의 성능을 달성해 STAPLE 및 Spatial STAPLE 베이스라인을 능가한다.
주석가에 대한 CM 추정 오차(MSE)가 Ours에서 현저히 낮으며 예: MNIST 0.0893, MSLesion 0.0811 등 베이스라인 대비 우수하다.
단일 라벨 per 이미지 설정에서도 Ours는 Dice 점수 56.43%로 베이스라인보다 여전히 우수하여 주석이 희박할 때도 로버스트함을 시사한다.
BraTS 및 LIDC-IDRI에서 dense 및 single-label 시나리오 모두에서 Ours가 STAPLE 변형보다 높은 Dice를 보이고 CM 추정에서 큰 개선(예: BraTS에서 14.4% 향상)을 보인다.
일반화된 엔에이지 디스턴스(GED) 비교에서 MNIST, MS, BraTS, LIDC-IDRI 데이터셋에 대해 Probabilistic U-Net보다 Ours가 우세하게 나타난다(예: MNIST 1.24 vs 1.46; MS 1.67 vs 1.91; BraTS 3.14 vs 3.23; LIDC-IDRI 1.87 vs 1.97).
데이터셋 전반에 걸쳐 영상 의존적 픽셀 단위 CM이 전역 CM이나 이미지별 베이스라인보다 관찰자 간 변동성을 더 잘 포착하며 분할 정확도 및 CM 정확도 측면에서 일관된 이점을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.