QUICK REVIEW

[论文解读] Privacy-Preserving Technology to Help Millions of People: Federated Prediction Model for Stroke Prevention

Ce Ju, Ruihui Zhao|arXiv (Cornell University)|Jun 15, 2020

Artificial Intelligence in Healthcare参考文献 15被引用 28

一句话总结

本文提出了一种用于中风风险预测的隐私保护联邦学习模型，通过在多家医院之间使用联邦平均算法进行协作，无需共享原始患者数据。通过在基于云的联邦学习框架下对去中心化的电子健康记录进行训练，该模型实现了接近集中式训练的性能——在小型医院中将准确率提升了10%至20%——同时确保了数据隐私，并实现了跨机构可扩展、安全的AI驱动中风预测。

ABSTRACT

Prevention of stroke with its associated risk factors has been one of the public health priorities worldwide. Emerging artificial intelligence technology is being increasingly adopted to predict stroke. Because of privacy concerns, patient data are stored in distributed electronic health record (EHR) databases, voluminous clinical datasets, which prevent patient data from being aggregated and restrains AI technology to boost the accuracy of stroke prediction with centralized training data. In this work, our scientists and engineers propose a privacy-preserving scheme to predict the risk of stroke and deploy our federated prediction model on cloud servers. Our system of federated prediction model asynchronously supports any number of client connections and arbitrary local gradient iterations in each communication round. It adopts federated averaging during the model training process, without patient data being taken out of the hospitals during the whole process of model training and forecasting. With the privacy-preserving mechanism, our federated prediction model trains over all the healthcare data from hospitals in a certain city without actual data sharing among them. Therefore, it is not only secure but also more accurate than any single prediction model that trains over the data only from one single hospital. Especially for small hospitals with few confirmed stroke cases, our federated model boosts model performance by 10%~20% in several machine learning metrics. To help stroke experts comprehend the advantage of our prediction system more intuitively, we developed a mobile app that collects the key information of patients' statistics and demonstrates performance comparisons between the federated prediction model and the single prediction model during the federated training process.

研究动机与目标

为解决医疗AI中的数据隐私问题，通过无需共享敏感患者记录的协作模型训练实现目标。
通过联邦学习提升中风预测准确率，特别是在中风病例数量有限的小型医院中。
开发一种可扩展的异步联邦学习系统，支持任意数量的客户端连接和本地训练迭代次数。
通过移动端小程序界面实现实时监控和可视化模型训练性能，覆盖多家医院。
建立一个可投入生产的、隐私保护的医疗AI流水线，已在某中国城市医院网络中实现真实世界部署。

提出的方法

系统使用联邦平均（FedAvg）聚合来自各医院私有云服务器的模型更新，无需传输原始数据。
每家医院在其自身的电子健康记录（EHR）数据上独立训练一个三层神经网络分类器，独立更新特征映射和分类器权重。
中央服务器使用以下公式聚合本地模型权重：$ w_{t+1} = \frac{1}{m} \sum_{i=1}^{m} w_t^i $，其中 $ m $ 为参与医院的数量。
该框架为异步设计，支持任意数量的客户端连接，并允许每轮通信中本地梯度迭代次数可变。
移动端小程序（FedAI中风预测）实时可视化患者统计数据、AUC分数，以及联邦模型与本地模型之间的性能对比。
系统基于腾讯云和FATE开源安全计算框架构建，确保数据机密性并符合隐私法规要求。

实验结果

研究问题

RQ1联邦学习框架是否能在不共享原始患者数据的前提下，实现与集中式训练相当或更优的中风预测性能？
RQ2与仅本地训练相比，联邦学习在中风病例数量有限的小型医院中，如何提升模型准确率？
RQ3在多个医疗机构之间协作AI训练的同时，联邦模型在多大程度上能够保持隐私？
RQ4异步联邦学习系统是否能够在真实医疗环境中支持动态客户端参与和可变的本地训练步数？
RQ5可视化系统在多大程度上帮助临床医生监控并理解跨分布式医院的模型训练动态？

主要发现

联邦预测模型的AUC均值为0.813，标准差为0.018，几乎与集中式训练的AUC（0.814 ± 0.014）相当。
在中风发病率低于1%的小型医院中，联邦模型相比仅本地训练的模型，AUC得分提升了10%至20%。
医院A贡献了约50%的总患者数据，其本地AUC为0.812，表明其数据主导了联邦模型的性能表现。
由于数据多样性与协作学习的优势，联邦模型优于所有单一医院的模型，包括大型机构的模型。
可视化系统成功实现了联邦模型与本地模型之间性能对比的实时展示，提升了临床医生对模型训练过程的透明度与信任感。
该系统证明了其可扩展性与安全性，支持异步客户端连接，并在整个训练过程中保持了数据隐私。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。