QUICK REVIEW

[論文レビュー] Toward Training at ImageNet Scale with Differential Privacy

Alexey Kurakin, Shuang Song|arXiv (Cornell University)|Jan 28, 2022

Privacy-Preserving Technologies in Data被引用数 21

ひとこと要約

本論文は、JAX上でDP-SGDを用いてImageNet規模のモデルを差分プライバシーで訓練することを調査し、Places365の事前学習から開始してε=10でResNet-18が47.9%のtop-1精度を達成することを示し、スケールでのDPの公開ベースラインとコードを共有している。

ABSTRACT

Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy. We set out to investigate how to do this, using ImageNet image classification as a poster example of an ML task that is very challenging to resolve accurately with DP right now. This paper shares initial lessons from our effort, in the hope that it will inspire and inform other researchers to explore DP training at scale. We show approaches that help make DP training faster, as well as model types and settings of the training process that tend to work better in the DP setting. Combined, the methods we discuss let us train a Resnet-18 with DP to $47.9\%$ accuracy and privacy parameters $ε= 10, δ= 10^{-6}$. This is a significant improvement over "naive" DP training of ImageNet models, but a far cry from the $75\%$ accuracy that can be obtained by the same network without privacy. The model we use was pretrained on the Places365 data set as a starting point. We share our code at https://github.com/google-research/dp-imagenet, calling for others to build upon this new baseline to further improve DP at scale.

研究の動機と目的

Motivate and evaluate training large neural networks on ImageNet under differential privacy.
Identify practical techniques to improve DP training utility and efficiency at scale.
Provide a reusable baseline and open-source resources to spur further DP-at-scale research.

提案手法

Use differential privacy stochastic gradient descent (DP-SGD) with gradient clipping and Gaussian noise to protect individual data points.
Leverage JAX for automatic vectorization and optimization of per-example gradient computations to reduce DP overhead.
Systematically explore model architectures (ResNet-18 vs ResNet-50), transfer learning, batch sizes, and hyperparameters to identify effective DP-training settings.
Pre-train models on public data (Places365) and fine-tune with DP-SGD on ImageNet to boost private accuracy.
Report practical DP budgets ε (with δ=1e-6) and provide an actionable baseline for DP at scale.

実験結果

リサーチクエスチョン

RQ1Can DP-SGD train ImageNet-scale models with meaningful accuracy under a practical privacy budget?
RQ2Which model architectures, training settings, and transfer-learning strategies yield better DP utility on ImageNet?
RQ3How do batch size, epoch count, and hyperparameters interact to affect the privacy-utility tradeoff in DP training?
RQ4What is the impact of public-pretraining and layer freezing on DP-finetuning performance?
RQ5What baseline performance and tooling are feasible to enable further DP-at-scale research?

主な発見

DP-SGD can train ImageNet-scale models with nonzero privacy guarantees, achieving 47.9% top-1 accuracy for ResNet-18 at ε=10 (δ=1e-6).
Smaller models can outperform larger ones at low ε, and transfer learning from public data substantially boosts private accuracy.
JAX-based DP training significantly outperforms Opacus and TF-Privacy in speed, bringing DP training closer to practical exploration; ImageNet DP epoch time with eight V100 GPUs is around 555 seconds (DP) versus 275.5 seconds (non-private).
Longer training with higher noise can yield better accuracy than shorter training with less noise, with an apparent accuracy plateau around 40–70 epochs for a fixed ε.
Hyperparameter tuning (clip norm, noise scale, learning rate) has a large impact; a practical tuning procedure can guide private training toward non-private-like performance without privacy loss.
Large-batch strategies and transfer learning (including freezing layers) further influence the privacy-utility balance and are viable levers for DP at scale.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。