QUICK REVIEW

[论文解读] AdaCliP: Adaptive Clipping for Private SGD

Venkatadheeraj Pichapati, Ananda Theertha Suresh|arXiv (Cornell University)|Aug 20, 2019

Privacy-Preserving Technologies in Data参考文献 41被引用 64

一句话总结

AdaCliP 是一种差分隐私 SGD 算法，使用坐标逐项自适应裁剪以最小化加入的噪声并在 DP 约束下提升模型精度。

ABSTRACT

Privacy preserving machine learning algorithms are crucial for learning models over user data to protect sensitive information. Motivated by this, differentially private stochastic gradient descent (SGD) algorithms for training machine learning models have been proposed. At each step, these algorithms modify the gradients and add noise proportional to the sensitivity of the modified gradients. Under this framework, we propose AdaCliP, a theoretically motivated differentially private SGD algorithm that provably adds less noise compared to the previous methods, by using coordinate-wise adaptive clipping of the gradient. We empirically demonstrate that AdaCliP reduces the amount of added noise and produces models with better accuracy.

研究动机与目标

在 DP-SGD 中说明需要一个有原理的方法裁剪策略，并降低梯度更新中的噪声。
建立一个理论上扎实的自适应梯度变换与裁剪框架，以将隐私引入的噪声降至最低。
在 MNIST 及其他模型上，在差分隐私约束下展示模型准确性的经验改进。

提出的方法

提出一个通用的梯度变换 g^t -> w^t = (g^t - a^t) / b^t，并对 w^t 的每个分量进行裁剪至范数为 1。
对裁剪后的变换梯度添加高斯噪声并重新缩放回原始尺度，得到私有梯度 \\tilde{g}^t。
推导在 E||w^t||^2 的上界 gamma 下使期望噪声最小的最优 a^t 和 b^t，得到 a^t_i = m^t_i 且 b^t_i = sqrt(s_i^t / gamma) * sqrt(sum_i s_i^t)。
给出 AdaCliP 算法，该算法从带噪梯度更新滑动均值 m^t 和近似方差 s^t，以在训练过程中自适应 a^t 和 b^t。
在使用固定学习率和 DP 梯度的情况下，为非凸目标提供收敛保证。

实验结果

研究问题

RQ1坐标逐项自适应裁剪是否能相较于全局裁剪或向量级裁剪，最小化为 DP-SGD 所添加的噪声？
RQ2在梯度范数上界下，最小化加入的高斯噪声的最优变换参数 a^t 和 b^t 是什么？
RQ3在相同的 DP 预算下，坐标自适应裁剪和变换是否在 MNIST 和相关模型上带来更好的经验准确性？
RQ4就噪声、偏差和在非凸目标上的收敛性而言，AdaCliP 与现有的 DP-SGD 方法相比如何？

主要发现

AdaCliP 通过使用坐标逐项自适应裁剪，在理论上比以往的 DP-SGD 方法加入的噪声更少。
在经验上，AdaCliP 在相同隐私设置下在 MNIST 与相似模型上取得比以往方法更高的准确性。
理论结果表明最优的 a^t 和 b^t 选择与标准 whitening 不同，并可降低加入噪声的 L2 范数。
在他们的实验中，基于动量的优化并未优于与 AdaCliP 结合的 SGD，即使每次迭代的噪声更少。
实验表明，在给定的 (epsilon, delta) 隐私预算下，AdaCliP 相对于基线方法实现高达 1.6% 的准确率提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。