[论文解读] Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging
本文评估用于医学影像模型的隐私保护(差分隐私)训练,比较其在胸部X线与三维CT胰腺导管腺癌PDAC任务上的准确率和公平性与非私有训练的差异,结果在存在一定准确性权衡的情况下实现可行的准确性与公平性。
Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training. For this, we used two datasets: (1) A large dataset (N=193,311) of high quality clinical chest radiographs, and (2) a dataset (N=1,625) of 3D abdominal computed tomography (CT) images, with the task of classifying the presence of pancreatic ductal adenocarcinoma (PDAC). Both were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver-operator-characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference. We found that, while the privacy-preserving trainings yielded lower accuracy, they did largely not amplify discrimination against age, sex or co-morbidity. Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.
研究动机与目标
- 由于数据敏感性,推动医学影像AI的隐私保护。
- 评估差分隐私对模型准确度(AUROC)和公平成性指标的影响。
- 在大型临床数据集上将隐私保护模型与非私有基线进行比较。
- 在真实世界数据中刻画隐私-效用和隐私-公平性的权衡。
提出的方法
- 使用两个数据集:193,311张胸部X线片和1,625例标注为胰腺导管腺癌的3D腹部CT。
- 使用隐私保护(DP)方法和非私有基线训练深度卷积神经网络。
- 以AUROC和公平成衡量指标(皮尔逊相关系数Pearson's r、统计平等差(Statistical Parity Difference))评估性能。
- 分析DP训练是否降低准确性以及是否放大人口统计学差异。
- 在现实的临床数据条件下报告结果。
实验结果
研究问题
- RQ1在大规模医学影像任务中,差分隐私如何影响诊断准确性(AUROC)?
- RQ2隐私保护训练是放大还是缓解在年龄、性别或共病因素上的歧视?
- RQ3在现实的临床数据集中,隐私保护模型在不严重损害公平性和准确性的情况下是否可行?
主要发现
- DP训练的准确性低于非私有模型。
- DP训练在很大程度上并未放大对年龄、性别或共病的歧视。
- 证明了隐私保护的诊断深度学习模型在真实临床数据集上可以实现出色的诊断准确性与公平性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。