QUICK REVIEW

[论文解读] CaPC Learning: Confidential and Private Collaborative Learning

Christopher A. Choquette-Choo, Natalie Dullerud|arXiv (Cornell University)|Feb 9, 2021

Privacy-Preserving Technologies in Data参考文献 36被引用 24

一句话总结

CaPC Learning 提出了首个在不依赖集中化数据或共享模型架构的前提下，同时保护数据机密性和隐私的机密且私密的协作机器学习方法。通过结合安全多方计算、同态加密以及私有聚合教师模型，CaPC 使各方能够在不泄露显式或隐式数据的情况下独立提升本地模型性能。

ABSTRACT

Machine learning benefits from large training datasets, which may not always be possible to collect by any single entity, especially when using privacy-sensitive data. In many contexts, such as healthcare and finance, separate parties may wish to collaborate and learn from each other's data but are prevented from doing so due to privacy regulations. Some regulations prevent explicit sharing of data between parties by joining datasets in a central location (confidentiality). Others also limit implicit sharing of data, e.g., through model predictions (privacy). There is currently no method that enables machine learning in such a setting, where both confidentiality and privacy need to be preserved, to prevent both explicit and implicit sharing of data. Federated learning only provides confidentiality, not privacy, since gradients shared still contain private information. Differentially private learning assumes unreasonably large datasets. Furthermore, both of these learning paradigms produce a central model whose architecture was previously agreed upon by all parties rather than enabling collaborative learning where each party learns and improves their own local model. We introduce Confidential and Private Collaborative (CaPC) learning, the first method provably achieving both confidentiality and privacy in a collaborative setting. We leverage secure multi-party computation (MPC), homomorphic encryption (HE), and other techniques in combination with privately aggregated teacher models. We demonstrate how CaPC allows participants to collaborate without having to explicitly join their training sets or train a central model. Each party is able to improve the accuracy and fairness of their model, even in settings where each party has a model that performs well on their own dataset or when datasets are not IID and model architectures are heterogeneous across parties.

研究动机与目标

解决由于隐私法规要求，必须同时保护协作机器学习中数据机密性和隐私的空白。
克服联邦学习的局限性，后者仅保证机密性，但会通过梯度泄露私有信息。
实现在不共享数据或预先约定中心化模型架构的前提下进行协作学习。
支持各方异构的模型架构和非独立同分布（non-IID）的数据分布。
在无法集中聚合数据的协作学习环境中，确保可证明的隐私与机密性。

提出的方法

利用安全多方计算（MPC）在不暴露输入的前提下执行分布式训练计算。
使用同态加密（HE）在加密数据上执行计算，从而在模型聚合过程中保护隐私。
引入私有聚合教师模型，实现在不暴露训练数据的前提下跨各方蒸馏知识。
将模型训练与模型聚合解耦，使各方能够独立训练并改进自身的本地模型。
设计一种协作学习框架，各方仅交换加密或混淆的模型更新，而非原始数据。
通过密码学保证，确保同时具备机密性（无数据共享）和隐私性（无从模型输出推断信息）。

实验结果

研究问题

RQ1能否设计一种协作学习框架，使得在存在隐私敏感数据的情况下，同时确保机密性和隐私性？
RQ2各方如何在不共享原始数据或训练中心化模型的前提下，提升其本地模型？
RQ3该框架能否支持参与者之间异构的模型架构和非独立同分布（non-IID）的数据分布？
RQ4哪些密码学技术可以组合使用，以防止协作学习中显性和隐性数据泄露？
RQ5在保持模型准确性和公平性的前提下，能否实现可证明的隐私与机密性？

主要发现

CaPC Learning 使各方能够在不共享原始数据或中心化模型的前提下，提升本地模型的准确性和公平性。
该方法同时实现了机密性和隐私性，防止了显式数据共享以及通过模型输出进行的推理攻击。
CaPC 支持异构模型架构和非独立同分布（non-IID）的数据分布，适用于现实世界中的协作场景。
该框架无需预先约定的模型架构，使各方能够独立训练和优化自身模型。
通过使用私有聚合教师模型，CaPC 实现了知识迁移，而无需暴露训练数据或中间模型状态。
该方法在密码学假设下具有可证明的安全性，确保了协作学习中机密性和隐私性的保障。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。