QUICK REVIEW

[论文解读] Multi-site fMRI Analysis Using Privacy-preserving Federated Learning and Domain Adaptation: ABIDE Results

Xiaoxiao Li, Yufeng Gu|arXiv (Cornell University)|Jan 16, 2020

Privacy-Preserving Technologies in Data参考文献 50被引用 27

一句话总结

该论文提出了一种结合领域自适应的隐私保护联邦学习框架，用于使用ABIDE数据集进行多中心fMRI分析，实现在不共享原始数据的情况下协同训练。通过结合随机化模型权重聚合与两种领域自适应方法（MoE和对抗性对齐），该方法在保护患者隐私的同时，提升了ASD与HC分类的准确性，并缓解了跨中心数据分布偏移问题。

ABSTRACT

Deep learning models have shown their advantage in many different tasks, including neuroimage analysis. However, to effectively train a high-quality deep learning model, the aggregation of a significant amount of patient information is required. The time and cost for acquisition and annotation in assembling, for example, large fMRI datasets make it difficult to acquire large numbers at a single site. However, due to the need to protect the privacy of patient data, it is hard to assemble a central database from multiple institutions. Federated learning allows for population-level models to be trained without centralizing entities' data by transmitting the global model to local entities, training the model locally, and then averaging the gradients or weights in the global model. However, some studies suggest that private information can be recovered from the model gradients or weights. In this work, we address the problem of multi-site fMRI classification with a privacy-preserving strategy. To solve the problem, we propose a federated learning approach, where a decentralized iterative optimization algorithm is implemented and shared local model weights are altered by a randomization mechanism. Considering the systemic differences of fMRI distributions from different sites, we further propose two domain adaptation methods in this federated learning formulation. We investigate various practical aspects of federated model optimization and compare federated learning with alternative training strategies. Overall, our results demonstrate that it is promising to utilize multi-site data without data sharing to boost neuroimage analysis performance and find reliable disease-related biomarkers. Our proposed pipeline can be generalized to other privacy-sensitive medical data analysis problems.

研究动机与目标

解决由于隐私和监管限制导致大规模数据在机构间分散时，训练高精度深度学习模型用于fMRI神经影像分析的挑战。
开发一种联邦学习框架，通过避免原始fMRI数据的集中化来保护患者隐私，并使用随机化模型权重更新防止重建攻击。
缓解因扫描协议、设备和受试者指导差异导致的领域偏移问题，以提升模型泛化能力。
评估联邦学习结合领域自适应与替代训练策略在分类自闭症谱系障碍（ASD）和健康对照（HC）方面的性能表现。
通过去中心化、隐私保护的深度学习流程，识别可靠的ASD功能连接生物标志物。

提出的方法

实施去中心化的联邦学习设置，其中本地模型在各机构本地使用本机构的fMRI数据进行训练，全局模型权重通过本地梯度或权重的平均进行更新。
对共享的模型权重应用随机化机制以防止隐私泄露，通过注入噪声来防御模型反演和重建攻击。
引入一种混合专家（MoE）领域自适应方法，学习各站点特定的专家模型，并通过路由网络自适应地组合跨站点的预测结果。
应用对抗性领域对齐方法，通过领域判别器最小化领域差异，实现不同站点潜在表示的对齐。
采用集成策略结合多个模型的预测结果，以提升模型鲁棒性和性能。
使用标准指标（如准确率、AUC、F1分数）在多中心fMRI数据集（ABIDE）上评估模型性能，并采用跨站点验证。

实验结果

研究问题

RQ1在不共享原始数据的前提下，结合隐私保护机制的联邦学习能否有效训练用于fMRI基ASD分类的深度学习模型？
RQ2引入领域自适应技术（MoE和对抗性对齐）如何提升在多中心异质fMRI数据上的模型性能？
RQ3通信频率和模型随机化对多中心fMRI设置中模型准确率和隐私保护的影响如何？
RQ4所提出的联邦学习框架能否识别出在各中心间具有泛化能力的可靠ASD功能连接生物标志物？
RQ5在何种条件下，领域自适应能在联邦fMRI分析中带来性能提升？

主要发现

所提出的结合隐私保护随机化的联邦学习框架在分类性能上达到与集中式训练相当的水平，证明了在不共享数据的前提下仍可实现高精度模型。
领域自适应方法，特别是MoE和对抗性对齐，在四分之二的站点中提升了分类准确率，一个站点性能保持不变，一个站点未见改善，表明其收益具有上下文依赖性。
在测试范围内，通信频率（全局模型更新频率）对模型性能无显著影响，表明该超参数在当前设置下具有鲁棒性。
模型识别出大脑中与ASD相关的功能连接模式，作为潜在生物标志物，其在楔前叶、外侧枕上回和颞上回等区域表现出一致激活。
集成方法相比单个模型提升了性能，未来采用更先进的集成技术（如堆叠或梯度提升）有望进一步优化结果。
本研究证明，结合领域自适应的联邦学习是一种可行且隐私保护的多中心神经影像研究方法，尤其适用于患者队列有限的罕见疾病。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。