QUICK REVIEW

[论文解读] Toward Training at ImageNet Scale with Differential Privacy

Alexey Kurakin, Shuang Song|arXiv (Cornell University)|Jan 28, 2022

Privacy-Preserving Technologies in Data被引用 21

一句话总结

本论文研究在 JAX 上使用 DP-SGD 对 ImageNet 规模模型进行差分隐私训练，显示 ResNet-18 在 ε=10 时达到 47.9% 的 top-1准确率，起始于 Places365 预训练，并提供公开基线和规模化 DP 的代码。

ABSTRACT

Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy. We set out to investigate how to do this, using ImageNet image classification as a poster example of an ML task that is very challenging to resolve accurately with DP right now. This paper shares initial lessons from our effort, in the hope that it will inspire and inform other researchers to explore DP training at scale. We show approaches that help make DP training faster, as well as model types and settings of the training process that tend to work better in the DP setting. Combined, the methods we discuss let us train a Resnet-18 with DP to $47.9\%$ accuracy and privacy parameters $ε= 10, δ= 10^{-6}$. This is a significant improvement over "naive" DP training of ImageNet models, but a far cry from the $75\%$ accuracy that can be obtained by the same network without privacy. The model we use was pretrained on the Places365 data set as a starting point. We share our code at https://github.com/google-research/dp-imagenet, calling for others to build upon this new baseline to further improve DP at scale.

研究动机与目标

在 ImageNet 上对大规模神经网络进行差分隐私训练的动机与评估。
识别在大规模下提升 DP 训练效用与效率的实用技术。
提供可重用的基线和开源资源以推动进一步的规模化 DP 研究。

提出的方法

使用差分隐私随机梯度下降 (DP-SGD)，对梯度进行裁剪并加入高斯噪声以保护个体数据点。
利用 JAX 的自动向量化与逐样本梯度运算优化，降低 DP 开销。
系统性地探索模型架构（ResNet-18 与 ResNet-50）、迁移学习、批量大小与超参数，以找出有效的 DP 训练设置。
在公开数据（Places365）上预训练模型，并在 ImageNet 上使用 DP-SGD 进行微调以提升私有准确度。
报告实际 DP 预算 ε（δ=1e-6）并提供可操作的规模化 DP 基线。

实验结果

研究问题

RQ1在实际隐私预算下，DP-SGD 是否能训练出具有意义准确性的 ImageNet 规模模型？
RQ2哪些模型架构、训练设置和迁移学习策略在 ImageNet 上能提供更好的 DP 效用？
RQ3批量大小、训练轮数和超参数如何相互作用，影响 DP 训练中的隐私-效用权衡？
RQ4公开预训练和冻结层对 DP 微调性能有何影响？
RQ5可行的基线性能和工具集是什么，以促进进一步的规模化 DP 研究？

主要发现

DP-SGD 能在具有非零隐私保证的前提下训练出 ImageNet 规模的模型，在 ε=10（δ=1e-6）下，ResNet-18 的 top-1 达到 47.9%。
较小的模型在较低 ε 下可以超过较大模型，且来自公开数据的迁移学习显著提升私有准确度。
基于 JAX 的 DP 训练在速度上显著优于 Opacus 和 TF-Privacy，使 DP 训练更接近实际探索；使用八个 V100 GPU 的 ImageNet DP 训练一个 epoch 约为 555 秒（DP），而非私有为 275.5 秒。
在更长的训练和更高的噪声下，准确度可能高于短训练低噪声，在固定 ε 下，似乎存在约 40–70 epochs 的准确率平台。
超参数调整（裁剪范数、噪声尺度、学习率）影响很大；一个实际的调优流程可以引导私有训练实现接近非私有的性能而不牺牲隐私。
大批量策略和迁移学习（包括冻结层）进一步影响隐私-效用平衡，是规模化 DP 的可行杠杆。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。