QUICK REVIEW

[论文解读] Privacy Preservation in Federated Learning: An insightful survey from the GDPR Perspective

Nguyen B. Truong, Kai Sun|arXiv (Cornell University)|Nov 10, 2020

Privacy-Preserving Technologies in Data参考文献 115被引用 26

一句话总结

本文從歐盟/英國《一般資料保障規範》（GDPR）的視角出發，對聯邦學習（FL）中的隱私保護技術進行了全面調查，分析了威脅、攻擊與解決方案。研究指出，儘管聯邦學習透過將訓練資料保留在本地，本質上提升了資料隱私，但模型參數仍可能洩漏敏感資訊，因此需要額外的隱私保護機制（如差別隱私與安全多方計算）以實現完整的GDPR合規。

ABSTRACT

Along with the blooming of AI and Machine Learning-based applications and services, data privacy and security have become a critical challenge. Conventionally, data is collected and aggregated in a data centre on which machine learning models are trained. This centralised approach has induced severe privacy risks to personal data leakage, misuse, and abuse. Furthermore, in the era of the Internet of Things and big data in which data is essentially distributed, transferring a vast amount of data to a data centre for processing seems to be a cumbersome solution. This is not only because of the difficulties in transferring and sharing data across data sources but also the challenges on complying with rigorous data protection regulations and complicated administrative procedures such as the EU General Data Protection Regulation (GDPR). In this respect, Federated learning (FL) emerges as a prospective solution that facilitates distributed collaborative learning without disclosing original training data whilst naturally complying with the GDPR. Recent research has demonstrated that retaining data and computation on-device in FL is not sufficient enough for privacy-guarantee. This is because ML model parameters exchanged between parties in an FL system still conceal sensitive information, which can be exploited in some privacy attacks. Therefore, FL systems shall be empowered by efficient privacy-preserving techniques to comply with the GDPR. This article is dedicated to surveying on the state-of-the-art privacy-preserving techniques which can be employed in FL in a systematic fashion, as well as how these techniques mitigate data security and privacy risks. Furthermore, we provide insights into the challenges along with prospective approaches following the GDPR regulatory guidelines that an FL system shall implement to comply with the GDPR.

研究动机与目标

分析聯邦學習（FL）系統中，即使資料儲存在本地，仍可能因模型參數洩漏而產生的隱私風險。
在GDPR合規的背景下，評估現有的隱私保護技術（如差別隱私、安全多方計算（SMC）與加密轉移學習）的成效。
識別當前FL系統中未能完全符合GDPR原則（包括資料最小化、目的限制與責任制）的缺口。
為基於FL的服務提供商提供具體可行的建議與指南，透過技術與架構措施實現GDPR合規。
強調在法規限制下，實現公平性、可解釋性與效率的隱私保護FL系統所面臨的開放性研究挑戰。

提出的方法

系統性分析FL系統架構、威脅模型與攻擊面，特別著重於推斷攻擊與模型投毒攻擊。
分類並評估集中式FL架構中差別隱私、同態加密與安全聚合（SMC）等隱私保護技術。
將這些技術的技術元件與GDPR原則（如合法性、公平性、透明度與資料最小化）對應映射。
評估部署於實際FL系統中，隱私、模型準確率與計算開銷之間的權衡。
提出一個透過將技術控制與法規義務（如資料處理透明度與責任制）對齊，以實現FL中GDPR合規的框架。
回顧近期在無知覺FL與偏見緩解方面的進展，以提升非獨立同分佈（non-IID）與偏態資料環境下的公平性與魯棒性。

实验结果

研究问题

RQ1在聯邦學習中，即使資料儲存在本地，模型參數如何暴露敏感個人資訊？哪些類型的隱私攻擊會利用此漏洞？
RQ2現有的隱私保護技術（如差別隱私與SMC）在多大程度上能促進FL系統符合GDPR原則？
RQ3在資料最小化、目的限制與責任制等層面，FL系統與GDPR要求對齊所面臨的主要技術與法規挑戰為何？
RQ4在密碼學約束下，如何有效整合公平性、可解釋性與偏見緩解機制於隱私保護FL系統中？
RQ5在實際FL部署中，採用差別隱私與同態加密等先進隱私機制時，其效能與準確率的權衡為何？

主要发现

FL本身並不能確保GDPR合規，因為訓練過程中交換的模型參數可能透過推斷攻擊洩漏個人資訊。
差別隱私與安全多方計算（SMC）在降低隱私洩漏方面具有效果，但會帶來顯著的模型準確率下降與計算開銷增加。
當模型更新包含足夠資訊以重構訓練資料時，GDPR的資料最小化原則在FL中受到挑戰，需額外的混淆技術。
模型投毒攻擊仍是重大威脅，特別是在參與者之間信任度低且難以檢測的去中心化環境中。
目前針對FL中公平性與偏見緩解的方法（如無知覺FL）在處理非獨立同分佈與偏態資料分佈方面展現出潛力，且不損及隱私。
FL系統中仍缺乏標準化、可審計且可解釋的機制，無法完全滿足GDPR對透明度與責任制的要求。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。