QUICK REVIEW

[论文解读] Differentially Private Diffusion Models

Tim Dockhorn, Tianshi Cao|arXiv (Cornell University)|Oct 18, 2022

Privacy-Preserving Technologies in Data被引用 21

一句话总结

该论文介绍了用 DP-SGD 训练的 Differentially Private Diffusion Models (DPDMs)，以及一个噪声多重性技术，实现了最先进的私有图像生成和强健的下游分类器性能。

ABSTRACT

While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains. Generative models trained with differential privacy (DP) on sensitive data can sidestep this challenge, providing access to synthetic data instead. We build on the recent success of diffusion models (DMs) and introduce Differentially Private Diffusion Models (DPDMs), which enforce privacy using differentially private stochastic gradient descent (DP-SGD). We investigate the DM parameterization and the sampling algorithm, which turn out to be crucial ingredients in DPDMs, and propose noise multiplicity, a powerful modification of DP-SGD tailored to the training of DMs. We validate our novel DPDMs on image generation benchmarks and achieve state-of-the-art performance in all experiments. Moreover, on standard benchmarks, classifiers trained on DPDM-generated synthetic data perform on par with task-specific DP-SGD-trained classifiers, which has not been demonstrated before for DP generative models. Project page and code: https://nv-tlabs.github.io/DPDM.

研究动机与目标

激励在差分隐私下训练扩散模型，以实现私有合成数据的生成。
研究 DM 参数化和采样如何影响 DP 性能与效用。
引入噪声多重性以在 DP-SGD 下降低梯度方差。
在标准基准上展示最先进的 DP 图像合成结果。
表明基于 DPDM 生成数据训练的分类器可达到特定任务的 DP 分类器水平。

提出的方法

采用 DP-SGD 来训练扩散模型，采用逐样本梯度裁剪和高斯噪声。
引入噪声多重性：在梯度裁剪/加噪之前，对每个数据点使用 K 个噪声样本计算损失。
评估四种 DM 配置（方差保持/发散、v-prediction、EDM）及其噪声时间表。
使用随机 DM 采样（DDIM/Churn）在 DP 下提升感知质量。
通过 Rényi DP 进行隐私核算并转换为 (ε,δ)-DP；为 DPDM 训练证明 DP 保证。

实验结果

研究问题

RQ1是否可以在差分隐私下使用 DP-SGD 训练扩散模型以生成高质量的合成数据？
RQ2DM 参数化与采样策略是否会影响 DP 的效用与隐私权衡？
RQ3所提出的噪声多重性是否在 DP 下提高学习效率和隐私-效用？
RQ4与先前的 DP 生成方法相比，DPDM 在标准图像合成基准及下游分类任务上表现如何？

主要发现

DPDMs 在常见基准（例如 MNIST）上实现了跨隐私预算的最先进的 DP 图像合成。
在 MNIST 上，当 DP ε=1 时，DPDM 实现了 FID 23.4，且在 DPDM 数据训练的真实数据分类器下游准确率达到 95.3%。
DPDM 生成的数据可以训练出与特定任务的 DP 训练判别模型相当的分类器。
噪声多重性降低梯度方差并提高学习效率，而不增加隐私预算。
在类似隐私约束下，以 DP-SGD 为基础的扩散模型训练比 DP 训练的 GAN 更稳定。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。