QUICK REVIEW

[논문 리뷰] Unlocking High-Accuracy Differentially Private Image Classification through Scale

Soham De, Leonard Berrada|arXiv (Cornell University)|2022. 04. 28.

Adversarial Robustness in Machine Learning인용 수 33

한 줄 요약

이 논문은 over-parameterized 모델과 신중한 하이퍼파라미터 튜닝 및 간단한 기법들을 사용함으로써 CIFAR-10 및 ImageNet에서 DP-SGD가 최첨단 이미지 분류 정확도를 달성할 수 있음을 보여준다. 여기서 포함된 기법으로는 대배치, group normalization, weight standardization, augmentation multiplicity, 그리고 pre-training fine-tuning이 있다.

ABSTRACT

Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method for deep learning, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA without extra data on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained NFNet-F3, we achieve a remarkable 83.8% top-1 accuracy on ImageNet under (0.5, 8*10^{-7})-DP. Additionally, we also achieve 86.7% top-1 accuracy under (8, 8 \cdot 10^{-7})-DP, which is just 4.3% below the current non-private SOTA for this task. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.

연구 동기 및 목표

형식적 프라이버시 보장 하에서 DP-SGD가 이미지 분류에 효과적임을 동기화한다.
표준 아키텍처에서 DP-SGD 성능을 개선하기 위한 단순한 기법들을 식별하고 결합한다.
CIFAR-10에서 추가 데이터 없이 최첨단 비공개 정확도 및 ImageNet에서 비공개 학습으로 강력한 결과를 시연한다.
DP 이미지 분류를 위한 사전 학습 후 비공개 미세 조정의 이점을 보여준다.
DP 제약 하에서 하이퍼파라미터 간의 관계에 대한 가이드를 제공한다.

제안 방법

over-parameterized 모델에서 DP-SGD 성능을 개선하기 위한 일련의 기법을 설명한다.
DP 학습에서 그래디언트 독립성을 유지하기 위해 batch normalization을 group normalization으로 교체한다.
학습 안정화를 위해 대배치 크기와 weight standardization을 탐구한다.
클리핑 전에 여러 augmentation에 걸친 per-example 그래디언트를 평균내어 augmentation multiplicity를 도입한다.
학습 중 매개변수 평균화(지수 이동 평균)을 적용한다.
비공개 데이터에 대한 사전 학습의 효과와 DP-SGD로의 비공개 미세 조정을 시연한다.

실험 결과

연구 질문

RQ1표준 over-parameterized 비전 모델이 DP-SGD로 학습될 때 CIFAR-10에서 추가 데이터 없이 최첨단 정확도에 도달할 수 있는가?
RQ2아키텍처 선택(예: group normalization, weight standardization)과 학습 전략(예: 대배치, augmentation multiplicity)이 DP-SGD의 이미지 분류 성능에 어떤 영향을 미치는가?
RQ3대규모 비공개 데이터셋에 대한 사전 학습과 이후의 비공개 미세 조정이 DP 이미지 분류 성능을 향상시키는가?
RQ4DP-SGD 성능을 최적화하는 실용적 하이퍼파라미터 관계(배치 크기, 학습률, 반복 횟수)는 무엇인가?

주요 결과

CIFAR-10에서 (8, 10^-5)-DP로 Wide-ResNet-40-4를 사용하고 추가 데이터 없이 81.4% top-1 정확도 달성, 이전 SOTA 71.7%를 상회.
NF-ResNet-50을 최초로 학습시키고 (8, 8×10^-7)-DP로 ImageNet에서 32.4% top-1 정확도 달성.
사전 학습된 NFNet-F3의 비공개 미세조정을 통해 (0.5, 8×10^-7)-DP에서 83.8%, (8, 8×10^-7)-DP에서 86.7% 달성, 비공개 SOTA에 근접.
대형 데이터셋(JFT-4B 등)에서 사전 학습 후 비공개 미세 조정을 거치면 ImageNet에서 (8, 8×10^-7)-DP로 86.7% top-1.
batch normalization을 group normalization으로 교체하고 대배치 크기를 사용하는 것이 DP-SGD 성능을 크게 향상시킴(예: CIFAR-10 인자 변화 결과).
augmentation multiplicity와 매개변수 평균화가 DP 제약하에서 CIFAR-10의 DP-SGD 정확도를 더 높인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.