QUICK REVIEW

[论文解读] Differentially Private Generative Adversarial Network

Liyang Xie, Kaixiang Lin|arXiv (Cornell University)|Feb 19, 2018

Generative Adversarial Networks and Image Synthesis参考文献 30被引用 313

一句话总结

DPGAN在训练GAN时加入梯度层面的噪声，以提供差分隐私，在保护训练数据的同时产生高质量样本。

ABSTRACT

Generative Adversarial Network (GAN) and its variants have recently attracted intensive research interests due to their elegant theoretical foundation and excellent empirical performance as generative models. These tools provide a promising direction in the studies where data availability is limited. One common issue in GANs is that the density of the learned generative distribution could concentrate on the training data points, meaning that they can easily remember training samples due to the high model complexity of deep networks. This becomes a major concern when GANs are applied to private or sensitive data such as patient medical records, and the concentration of distribution may divulge critical patient information. To address this issue, in this paper we propose a differentially private GAN (DPGAN) model, in which we achieve differential privacy in GANs by adding carefully designed noise to gradients during the learning procedure. We provide rigorous proof for the privacy guarantee, as well as comprehensive empirical evidence to support our analysis, where we demonstrate that our method can generate high quality data points at a reasonable privacy level.

研究动机与目标

在像医学等敏感领域分享GAN生成数据时引发的隐私担忧。
提出一个 DP-GAN 框架，在 GAN 训练过程中提供正式的差分隐私保障。
证明通过梯度层面的噪声添加和判别器权重裁剪可以实现隐私。
证明 DP-GAN 在合理的隐私预算下在基准数据上可以生成高质量数据。

提出的方法

采用 Wasserstein GAN (WGAN) 框架，并加入精心设计的梯度噪声和裁剪。
使用矩记账法来界定隐私损失并推导 ε、δ 的得到保障。
裁剪判别器权重以约束梯度范数，并对梯度估计添加高斯噪声。
证明在带噪声的判别器训练满足 (ε, δ)-差分隐私，且后处理产生的生成器参数保持私密性。
通过在 MNIST 和 MIMIC-III 上的不同 ε 值的实验来验证隐私保障。

实验结果

研究问题

RQ1DP-GAN 框架在训练期间是否能提供正式的差分隐私保障？
RQ2梯度层面的噪声如何在不同隐私预算下影响生成数据的质量？
RQ3在 DP-GAN 中，隐私等级（ε）与生成性能之间的关系是什么？
RQ4生成器在不记忆训练样本的前提下，能否生成有用的数据？

主要发现

DPGAN 在保持对训练数据保护的同时，在合理的隐私水平下能够生成高质量的数据点。
在训练过程中 Wasserstein 距离收敛，并且在更强隐私（更多噪声）时波动更大。
通过在不同 ε 处的最近邻比较显示，生成的数据与训练样本仍然不同。
在 MNIST 上使用生成数据进行的分类任务显示，随着更强隐私（更小 ε）的实现，性能下降，因为存在噪声。
该框架可以推广到不同的网络结构和数据集（MNIST 和 MIMIC-III）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。