QUICK REVIEW

[论文解读] Correlated-Output Differential Privacy and Applications to Dark Pools

Chiang, James Hsin-yu, Davis Railsback|arXiv (Cornell University)|Feb 5, 2022

Privacy-Preserving Technologies in Data被引用 8

一句话总结

该论文提出了一种新颖的MPC+DP框架，通过使用安全多方计算（MPC）模拟可信第三方，实现在无可信协调方的情况下，跨多个数据拥有者进行机器学习模型的隐私保护训练。该方法在保持形式化隐私保证的同时，相比本地差分隐私（Local DP）实现了更高的模型准确率，并在iDASH2021基因组分析竞赛中获得第一名。

ABSTRACT

In the classical setting of differential privacy, a privacy-preserving query is performed on a private database, after which the query result is released to the analyst; a differentially private query ensures that the presence of a single database entry is protected from the analyst’s view. In this work, we contribute the first definitional framework for differential privacy in the trusted curator setting (Fig. 1); clients submit private inputs to the trusted curator, which then computes individual outputs privately returned to each client. The adversary is more powerful than the standard setting; it can corrupt up to n-1 clients and subsequently decide inputs and learn outputs of corrupted parties. In this setting, the adversary also obtains leakage from the honest output that is correlated with a corrupted output. Standard differentially private mechanisms protect client inputs but do not mitigate output correlation leaking arbitrary client information, which can forfeit client privacy completely. We initiate the investigation of a novel notion of correlated-output differential privacy to bound the leakage from output correlation in the trusted curator setting. We define the satisfaction of both standard and correlated-output differential privacy as round differential privacy and highlight the relevance of this novel privacy notion to all application domains in the trusted curator model. We explore round differential privacy in traditional "dark pool" market venues, which promise privacy-preserving trade execution to mitigate front-running; privately submitted trade orders and trade execution are kept private by the trusted venue operator. We observe that dark pools satisfy neither classic nor correlated-output differential privacy; in markets with low trade activity, the adversary may trivially observe recurring, honest trading patterns, and anticipate and front-run future trades. In response, we present the first round differentially private market mechanisms that formally mitigate information leakage from all trading activity of a user. This is achieved with fuzzy order matching, inspired by the standard randomized response mechanism; however, this also introduces a liquidity mismatch as buy and sell orders are not guaranteed to execute pairwise, thereby weakening output correlation; this mismatch is compensated for by a round differentially private liquidity provider mechanism, which freezes a noisy amount of assets from the liquidity provider for the duration of a privacy epoch, but leaves trader balances unaffected. We propose oblivious algorithms for realizing our proposed market mechanisms with secure multi-party computation (MPC) and implement these in the Scale-Mamba Framework using Shamir Secret Sharing based MPC. We demonstrate practical, round differentially private trading with comparable throughput as prior work implementing (traditional) dark pool algorithms in MPC; our experiments demonstrate practicality for both traditional finance and decentralized finance settings.

研究动机与目标

解决在水平或垂直分布的数据上训练高精度机器学习模型的同时保护隐私的挑战。
克服在去中心化数据上训练时，纯差分隐私（DP）方法固有的准确率损失问题。
通过安全多方计算（MPC）模拟全局差分隐私（DP），消除对可信协调方的依赖。
在医疗和广告等场景中常见特征被垂直划分的数据环境下，实现隐私保护的模型训练，而此前的MPC+DP方法在此类场景中失效。
提供一种通用且可扩展的框架，支持多种线性模型和DP机制，且无需专家调参。

提出的方法

该方法使用MPC协议在多个数据拥有者之间联合训练逻辑回归模型，而无需暴露原始数据。
采用基于秘密共享的MPC协议（如加法共享）在多方之间私密计算模型权重。
模型训练完成后，利用MPC对模型系数添加拉普拉斯噪声，以满足(ϵ, δ)-差分隐私。
噪声添加过程为分布式执行，模拟可信协调方在全局DP中的角色，确保端到端的隐私保障。
该方法支持水平和垂直数据划分，并兼容被动和主动攻击模型。
该框架具有模块化设计：逻辑回归训练协议（πLR）可替换为其他线性学习器，拉普拉斯机制也可替换为高斯噪声以实现(ϵ, δ)-DP。

实验结果

研究问题

RQ1能否利用MPC在不暴露原始数据的前提下，模拟可信协调方角色，实现在分布式模型训练中的全局差分隐私？
RQ2在联邦学习设置中，结合MPC与全局DP是否能获得比本地DP更高的模型准确率？
RQ3所提出的MPC+DP框架能否处理特征在各方之间划分的垂直数据分布场景？
RQ4MPC+DP方法的性能如何随参与方数量和敌手威胁模型的变化而扩展？
RQ5MPC+DP框架是否可在无需重新配置的前提下，通用化支持不同线性模型和DP机制？

主要发现

MPC+DP方法在iDASH2021第III赛道中，使用医疗理赔数据预测野生型转甲状腺素蛋白淀粉样变性心肌病风险，获得第一名。
在水平分布设置下，使用诚实多数、被动攻击MPC协议（3方、32核虚拟机），训练时间缩短至1.3分钟以内。
在具有诚实多数的主动攻击设置下，4方配置的训练在30分钟内完成，证明了其实际可行性。
与本地DP基线相比，该方法在准确率上表现更优，尤其在单个数据拥有者数据量有限时，因噪声累积减少而优势更明显。
该方法在保持强隐私保障（ϵ=1, δ=1e-5）的同时，支持明文模型共享，而本地DP会因模型查询导致信息泄露。
该框架具有可扩展性：支持L2正则化逻辑回归，并可适配高斯噪声机制以实现(ϵ, δ)-DP。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。