QUICK REVIEW

[论文解读] Unlocking High-Accuracy Differentially Private Image Classification through Scale

Soham De, Leonard Berrada|arXiv (Cornell University)|Apr 28, 2022

Adversarial Robustness in Machine Learning被引用 33

一句话总结

论文表明，DP-SGD 通过使用超参数化的模型、细致的超参数调优和一些简单技术（包括大批量、组归一化、权重标准化、增强多重性以及预训练微调）可以在 CIFAR-10 和 ImageNet 上达到最先进的图像分类准确率。

ABSTRACT

Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method for deep learning, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA without extra data on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained NFNet-F3, we achieve a remarkable 83.8% top-1 accuracy on ImageNet under (0.5, 8*10^{-7})-DP. Additionally, we also achieve 86.7% top-1 accuracy under (8, 8 \cdot 10^{-7})-DP, which is just 4.3% below the current non-private SOTA for this task. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.

研究动机与目标

在正式隐私保护保证下，推动 DP-SGD 在图像分类中的有效性。
识别并结合简单技巧以提升 DP-SGD 在标准架构上的性能。
在 CIFAR-10 上无需额外数据即可达到最先进的私有准确率，并在 ImageNet 上取得强劲的私有训练结果。
展示先进行预训练再进行私有微调对 DP 图像分类的益处。
提供在 DP 限制下的超参数关系指引。

提出的方法

描述一组技术，以提升对过度参数化模型的 DP-SGD 性能。
用组归一化替代批量归一化，以在 DP 训练中保持梯度独立性。
研究大批量大小和权重标准化以稳定训练。
通过在裁剪前对多个增强的逐样本梯度求平均，引入增强多重性。
在训练过程中应用参数平均（指数移动平均）。
展示在非私有数据上进行预训练后再用 DP-SGD 进行私有微调的效果。

实验结果

研究问题

RQ1使用 DP-SGD 训练的标准过度参数化视觉模型是否能在 CIFAR-10 上在不使用额外数据的情况下达到最先进的准确率？
RQ2结构选择（如组归一化、权重标准化）和训练策略（如大批量、增强多重性）如何影响 DP-SGD 在图像分类上的性能？
RQ3在大型非私有数据集上进行预训练并随后进行私有微调是否会提高 DP 图像分类的性能？
RQ4有哪些实际的超参数关系（批量大小、学习率、迭代次数）可以优化 DP-SGD 的性能？

主要发现

在 CIFAR-10 上以 Wide-ResNet-40-4，在 (8, 10^-5)-DP 下且无额外数据，达到 81.4% 的 top-1 准确率，超越之前的 SOTA 71.7%。
在 ImageNet 上以 NF-ResNet-50 从头训练，在 (8, 8×10^-7)-DP 下达到 32.4% 的 top-1 准确率。
对预训练的 NFNet-F3 进行私有微调，在 (0.5, 8×10^-7)-DP 下达到 83.8% 的 top-1，在 (8, 8×10^-7)-DP 下达到 86.7%，接近非私有 SOTA。
在大型数据集（如 JFT-4B）上进行预训练后进行私有微调，在 (8, 8×10^-7)-DP 下得到 ImageNet 的 86.7% top-1。
用组归一化替代批量归一化并使用大批量大小显著提升 DP-SGD 的性能（例如 CIFAR-10 消融结果）。
增强多重性和参数平均在 DP 约束下进一步提升 CIFAR-10 的 DP-SGD 精度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。