QUICK REVIEW

[论文解读] Differentially Private Learning with Adaptive Clipping

Galen Andrew, Om Thakkar|arXiv (Cornell University)|May 9, 2019

Privacy-Preserving Technologies in Data参考文献 35被引用 89

一句话总结

本文提出在 DP-FedAvg 过程中私下估计并截断到目标分位数（例如中位数）的每位用户更新范数，从而实现自适应、私有截断，而无需调定固定截断值，并在跨越多种联邦学习任务中显示出强大效用。

ABSTRACT

Existing approaches for training neural networks with user-level differential privacy (e.g., DP Federated Averaging) in federated learning (FL) settings involve bounding the contribution of each user's model update by clipping it to some constant value. However there is no good a priori setting of the clipping norm across tasks and learning settings: the update norm distribution depends on the model architecture and loss, the amount of data on each device, the client learning rate, and possibly various other parameters. We propose a method wherein instead of a fixed clipping norm, one clips to a value at a specified quantile of the update norm distribution, where the value at the quantile is itself estimated online, with differential privacy. The method tracks the quantile closely, uses a negligible amount of privacy budget, is compatible with other federated learning technologies such as compression and secure aggregation, and has a straightforward joint DP analysis with DP-FedAvg. Experiments demonstrate that adaptive clipping to the median update norm works well across a range of realistic federated learning tasks, sometimes outperforming even the best fixed clip chosen in hindsight, and without the need to tune any clipping hyperparameter.

研究动机与目标

Motivate the difficulty of choosing a fixed clipping norm in user-level DP with Federated Averaging.
Introduce a privately estimable quantile clipping mechanism to track a specified norm quantile (e.g., median) of updates.
Demonstrate compatibility of adaptive clipping with DP-FedAvg, compression, and secure aggregation.
Empirically compare adaptive clipping to fixed clipping across realistic FL tasks and show scenarios where it matches or surpasses fixed clipping without tuning.

提出的方法

Define a quantile-based clipping loss that yields the gamma-quantile of update norms.
Use online gradient descent (with a geometric update) to track the clipping threshold C toward the gamma-quantile of the update-norm distribution.
Privately estimate the clipped-update indicator sum by adding Gaussian noise to the count before updating C, ensuring differential privacy.
Augment FedAvg with server momentum and private adaptive clipping to obtain DP-FedAvg-M, and relate its privacy to a non-adaptive DP-FedAvg via a stated equivalence.
Provide practical defaults (e.g., sigma_b = m/20, eta_C = 0.2) and a privacy analysis for the quantile-tracking process.
Present a DP accounting result that the sequence of quantile estimates satisfies (0.034, n^{-1.1})-DP under RDP composition across rounds.

实验结果

研究问题

RQ1Can adaptive clipping to a target update-norm quantile (e.g., median) provide DP-FedAvg with better or comparable utility to fixed clipping without hyperparameter tuning?
RQ2How can the clipping threshold be updated privately and efficiently to track a quantile of the update-norm distribution in a federated setting?
RQ3What is the impact of adaptive quantile clipping on privacy loss and noise amplification compared to fixed clipping?
RQ4Is DP-FedAvg-M compatible with common FL techniques like compression and secure aggregation while preserving DP guarantees?

主要发现

Adaptive clipping to the median (gamma = 0.5) generally improves or matches performance relative to unclipped baselines across multiple tasks.
In most tasks, adaptive clipping performs as well as or better than any fixed clip chosen in hindsight, without hyperparameter tuning.
Compared to fixed clipping, the adaptive approach often yields higher utility given the same privacy budget, and requires no tuning of a clipping hyperparameter.
The proposed DP-FedAvg-M with adaptive clipping remains compatible with compression and secure aggregation.
Quantile tracking with geometric updates converges to the target quantile and can be private with small additional privacy cost (e.g., negligible when m is large).
With practical defaults (sigma_b = m/20, eta_C = 0.2), the adaptive method incurs only modest incremental noise on updates while achieving DP guarantees.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。