QUICK REVIEW

[論文レビュー] Unlocking High-Accuracy Differentially Private Image Classification through Scale

Soham De, Leonard Berrada|arXiv (Cornell University)|Apr 28, 2022

Adversarial Robustness in Machine Learning被引用数 33

ひとこと要約

この論文は、過剰パラメータ化されたモデルを用い、慎重なハイパーパラメータ調整と単純な手法（大規模バッチ、group normalization、weight standardization、augmentation multiplicity、pre-training fine-tuning を含む）により、CIFAR-10 および ImageNet で DP-SGD が最先端の画像分類精度を達成できることを示している。

ABSTRACT

Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method for deep learning, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA without extra data on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained NFNet-F3, we achieve a remarkable 83.8% top-1 accuracy on ImageNet under (0.5, 8*10^{-7})-DP. Additionally, we also achieve 86.7% top-1 accuracy under (8, 8 \cdot 10^{-7})-DP, which is just 4.3% below the current non-private SOTA for this task. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.

研究の動機と目的

正式なプライバシー保証の下で画像分類におけるDP-SGDの有効性を動機づける。
標準的なアーキテクチャ上でDP-SGDの性能を向上させるための単純な手法を特定し組み合わせる。
追加データなしでCIFAR-10における最先端のプライベート精度を示し、ImageNetではプライベート学習で強い結果を示す。
DP画像分類のための事前学習に続くプライベートファインチューニングの利点を示す。
DP制約下でのハイパーパラメータの関係性に関する指針を提供する。

提案手法

過剰パラメータ化されたモデルのDP-SGD性能を向上させる一連の手法を説明する。
DPトレーニングにおける勾配の独立性を保つために batch normalization を group normalization に置換する。
学習を安定化させるために大規模バッチサイズと weight standardization を探索する。
クリッピング前に複数の拡張で各サンプル勾配を平均化して augmentation multiplicity を導入する。
訓練中にパラメータ平均化（指数移動平均）を適用する。
非プライベートデータでの事前学習と、それに続くDP-SGDによるプライベートファインチューニングの効果を示す。

実験結果

リサーチクエスチョン

RQ1追加データなしで、DP-SGDで訓練された標準的な過剰パラメータ化視覚モデルはCIFAR-10で最先端の精度を達成できるのか？
RQ2構造的選択肢（例：group normalization、weight standardization）と訓練戦略（例：大規模バッチ、augmentation multiplicity）がDP-SGDの画像分類性能にどのように影響するか？
RQ3大規模な非プライベートデータでのプレトレーニングとそれに続くプライベートファインチューニングはDP画像分類の性能を向上させるか？
RQ4DP-SGDの性能を最適化する実践的なハイパーパラメータの関係（バッチサイズ、学習率、反復回数）は何か？

主な発見

Wide-ResNet-40-4 で (8, 10^-5)-DP の下、追加データなしで CIFAR-10 の top-1 精度 81.4% を達成し、以前の SOTA 71.7% を超えた。
NF-ResNet-50 をゼロから訓練して、(8, 8×10^-7)-DP の下で ImageNet の top-1 精度 32.4% を達成。
事前学習した NFNet-F3 をプライベートファインチューニングすると、(0.5, 8×10^-7)-DP で 83.8%、(8, 8×10^-7)-DP で 86.7%、非プライベート SOTA に近い。
大規模データセット（例：JFT-4B）での事前学習の後にプライベートファインチューニングを行うと、ImageNet の top-1 が (8, 8×10^-7)-DP で 86.7% を得る。
batch normalization を group normalization に置換し、大規模バッチサイズを使用することで DP-SGD の性能が著しく向上する（例：CIFAR-10 のアブレーション結果）。
augmentation multiplicity とパラメータ平均化は、DP 制約下で CIFAR-10 の DP-SGD 精度をさらに高める。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。