QUICK REVIEW

[论文解读] Differentially Private Data Generative Models

Qingrong Chen, Chong Xiang|arXiv (Cornell University)|Dec 6, 2018

Privacy-Preserving Technologies in Data参考文献 51被引用 32

一句话总结

本文提出两种基于差分隐私的生成模型——DP-AuGM（基于差分隐私自编码器的生成模型）和DP-VaeGM（基于差分隐私变分自编码器的生成模型），在保护隐私的同时生成高可用性的合成数据。通过结合差分隐私与数据扰动，这些模型可有效防御模型反演攻击、成员推理攻击以及基于GAN的攻击，并可无缝集成至机器学习即服务（MLaaS）和联邦学习等实际系统中。

ABSTRACT

Deep neural networks (DNNs) have recently been widely adopted in various applications, and such success is largely due to a combination of algorithmic breakthroughs, computation resource improvements, and access to a large amount of data. However, the large-scale data collections required for deep learning often contain sensitive information, therefore raising many privacy concerns. Prior research has shown several successful attacks in inferring sensitive training data information, such as model inversion, membership inference, and generative adversarial networks (GAN) based leakage attacks against collaborative deep learning. In this paper, to enable learning efficiency as well as to generate data with privacy guarantees and high utility, we propose a differentially private autoencoder-based generative model (DP-AuGM) and a differentially private variational autoencoder-based generative model (DP-VaeGM). We evaluate the robustness of two proposed models. We show that DP-AuGM can effectively defend against the model inversion, membership inference, and GAN-based attacks. We also show that DP-VaeGM is robust against the membership inference attack. We conjecture that the key to defend against the model inversion and GAN-based attacks is not due to differential privacy but the perturbation of training data. Finally, we demonstrate that both DP-AuGM and DP-VaeGM can be easily integrated with real-world machine learning applications, such as machine learning as a service and federated learning, which are otherwise threatened by the membership inference attack and the GAN-based attack, respectively.

研究动机与目标

为解决由敏感训练数据引发的机器学习隐私风险，特别是在协作式与云环境下的应用。
开发能够生成具有强隐私保障的合成数据的生成模型，同时保持下游学习任务所需的数据可用性。
防御现代隐私攻击，包括模型反演、成员推理以及联邦学习中的基于GAN的梯度泄露攻击。
实现隐私保护型数据生成在真实机器学习系统（如机器学习即服务与联邦学习）中的实际集成。
证明数据扰动结合差分隐私是防御非成员隐私攻击（如模型反演与基于GAN的重建攻击）的关键因素。

提出的方法

提出DP-AuGM，一种在私有数据上训练的差分隐私自编码器，通过注入噪声以确保差分隐私，实现本地数据合成。
开发DP-VaeGM，一种差分隐私变分自编码器，通过潜在空间中的噪声注入，对推理与生成过程同时应用差分隐私。
通过有界梯度裁剪与噪声添加，将差分隐私应用于训练过程，确保生成模型满足(ε, δ)-差分隐私。
使用公开或清洗后的数据作为训练后生成模型的输入，使第三方可在不暴露原始私有数据的前提下生成新的合成数据。
采用类似知识蒸馏的机制，使生成模型作为教师模型，为学生模型生成合成数据，从而在保护隐私的同时保持数据可用性。
将模型集成至MLaaS与联邦学习流程中，用合成数据替代原始私有数据，从而降低模型反演与基于梯度攻击的风险。

实验结果

研究问题

RQ1基于差分隐私的生成模型能否有效防御模型反演攻击，即从模型输出重建敏感训练数据？
RQ2DP-AuGM与DP-VaeGM能否抵御成员推理攻击，即判断某数据点是否属于训练集？
RQ3这些模型能否缓解基于GAN的攻击，即在协作学习系统中从共享梯度重建私有数据？
RQ4差分隐私与数据扰动在防御非成员隐私攻击中的相对贡献如何？
RQ5这些模型在不牺牲数据可用性的前提下，能否有效集成至MLaaS与联邦学习等真实系统中？

主要发现

DP-AuGM在协作式深度学习中，即使训练过程已应用差分隐私，仍能有效防御模型反演、成员推理及基于GAN的攻击。
DP-VaeGM对成员推理攻击表现出强鲁棒性，证实其在隐私保护模型训练中的实用性。
作者推测，模型反演与基于GAN的攻击的主要防御机制源于训练过程中的数据扰动，而非仅依赖差分隐私。
DP-AuGM与DP-VaeGM均保持了高数据可用性，使基于生成合成数据的下游机器学习任务能够有效执行。
这些模型可轻松集成至MLaaS与联邦学习系统中，通过用差分隐私合成数据替代原始数据，防止隐私泄露。
任何在生成数据上训练的机器学习模型，均可继承生成模型的差分隐私保证，实现端到端的隐私保护。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。