QUICK REVIEW
[论文解读] Differentially Private Learning Needs Better Features (or Much More Data)
Florian Tramèr, Dan Boneh|arXiv (Cornell University)|Nov 23, 2020
Privacy-Preserving Technologies in Data参考文献 75被引用 66
一句话总结
本文表明,在差分隐私学习中, handcrafted 特征(ScatterNet)或迁移学习可显著提升性能;端到端私有深度学习若没有更多数据或公开数据可用,仍落后。
ABSTRACT
We demonstrate that differentially private machine learning has not yet reached its "AlexNet moment" on many canonical vision tasks: linear models trained on handcrafted features significantly outperform end-to-end deep neural networks for moderate privacy budgets. To exceed the performance of handcrafted features, we show that private learning requires either much more private data, or access to features learned on public data from a similar domain. Our work introduces simple yet strong baselines for differentially private learning that can inform the evaluation of future progress in this area.
研究动机与目标
- Motivate and quantify the utility gap between private end-to-end learning and shallow handcrafted-feature baselines in vision tasks.
- Propose and evaluate strong handcrafted baselines (ScatterNet features with DP-SGD) for differentially private learning.
- Investigate factors behind the benefits of handcrafted features, including convergence and data requirements.
- Assess whether more data or transfer learning from public data can close the privacy-utility gap.
- Provide practical baselines and open directions for improving private deep learning.
提出的方法
- Use the Scattering Network (ScatterNet) as a fixed, non-learned feature extractor to encode image priors (invariance to small rotations/translations).
- Train private linear models or private CNNs on top of ScatterNet features using DP-SGD with carefully chosen normalization.
- Systematically compare private ScatterNet baselines to end-to-end private CNNs across MNIST, Fashion-MNIST, and CIFAR-10 at various DP budgets.
- Analyze convergence behavior and the impact of feature dimensionality, learning rates, and batch sizes on private learning performance.
- Explore data augmentation via additional private data (pseudo-labeled Tiny Images) and transfer learning from public data (CIFAR-100, SimCLR/ImageNet) to improve DP utility.
- Report reproducible results and provide public code for replication.
实验结果
研究问题
- RQ1Can handcrafted features improve the privacy-utility trade-off in differentially private vision models compared to end-to-end private learning?
- RQ2How do ScatterNet features influence convergence and DP-SGD performance across standard vision benchmarks under moderate privacy budgets?
- RQ3What is the data cost (private data or public data) required for end-to-end private models to match handcrafted-feature baselines?
- RQ4Can transfer learning from public data or larger private datasets close the privacy-utility gap for DP-SGD?
- RQ5Do deeper networks trained on high-quality handcrafted features outperform linear models, and under what conditions?
主要发现
- Linear models trained on ScatterNet features outperform end-to-end private CNNs for DP budgets ε ≤ 3 on MNIST, Fashion-MNIST, and CIFAR-10.
- On CIFAR-10, ScatterNet+linear achieves 67.0–69.3% depending on DP budget, surpassing prior end-to-end private CNN results and improving DP-guarantees by about 134× (e⁴⁺) relative to a referenced baseline.
- On MNIST, ScatterNet-based approaches reach or exceed private transfer-learning benchmarks such as PATE without public data access.
- Deeper models trained on ScatterNet features also improve private performance over end-to-end private CNNs in some cases (e.g., CIFAR-10).
- Normalization of ScatterNet features (Group Normalization or Data Normalization) is crucial for convergence and privacy-utility performance; Data Normalization can outperform Group Normalization on CIFAR-10 when privacy cost is justified.
- Additional private data or access to public unlabeled data (e.g., ImageNet, SimCLR) significantly boosts private end-to-end learning, enabling end-to-end models to approach or surpass ScatterNet baselines.
- Transfer learning from public data (e.g., CIFAR-100, SimCLR/ImageNet) yields notable gains under DP, illustrating the value of higher-quality features for private learning
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。