QUICK REVIEW

[论文解读] Tempered Sigmoid Activations for Deep Learning with Differential Privacy

Nicolas Papernot, Abhradeep Thakurta|arXiv (Cornell University)|Jul 28, 2020

Privacy-Preserving Technologies in Data被引用 26

一句话总结

本论文提出使用温度缩放Sigmoid激活函数——具体而言，是一类有界激活函数——以提升差分隐私下的深度学习性能。通过抑制梯度爆炸和减少梯度裁剪带来的损失，温度缩放Sigmoid函数能够实现更快的收敛速度，并显著改善隐私-准确率权衡，在不修改训练流程的前提下，于MNIST（98.1%）、FashionMNIST（86.1%）和CIFAR10（66.2%）上实现了当前最优结果。

ABSTRACT

Because learning sometimes involves sensitive data, machine learning algorithms have been extended to offer privacy for training data. In practice, this has been mostly an afterthought, with privacy-preserving models obtained by re-running training with a different optimizer, but using the model architectures that already performed well in a non-privacy-preserving setting. This approach leads to less than ideal privacy/utility tradeoffs, as we show here. Instead, we propose that model architectures are chosen ab initio explicitly for privacy-preserving training. To provide guarantees under the gold standard of differential privacy, one must bound as strictly as possible how individual training points can possibly affect model updates. In this paper, we are the first to observe that the choice of activation function is central to bounding the sensitivity of privacy-preserving deep learning. We demonstrate analytically and experimentally how a general family of bounded activation functions, the tempered sigmoids, consistently outperform unbounded activation functions like ReLU. Using this paradigm, we achieve new state-of-the-art accuracy on MNIST, FashionMNIST, and CIFAR10 without any modification of the learning procedure fundamentals or differential privacy analysis.

研究动机与目标

为解决标准ReLU激活函数在差分隐私深度学习中因激活值无界和梯度裁剪效应导致的性能低下问题。
通过从一开始就为隐私保护训练显式设计模型架构，改进DP-SGD中的隐私-准确率权衡。
证明有界激活函数（如温度缩放Sigmoid）能在裁剪和噪声注入下更好地保留梯度信号，从而提升模型效用。
表明在隐私学习中，必须重新评估架构选择（尤其是激活函数），而非简单复用非隐私模型的架构。
确立温度缩放Sigmoid作为隐私深度学习中的更优默认激活函数，其性能在多个基准测试中全面优于ReLU。

提出的方法

提出一类广义的有界激活函数家族，称为温度缩放Sigmoid，定义为 $ \sigma_T(x) = \frac{1}{1 + e^{-x/T}} $，其中 $ T $ 控制温度并限制输出范围。
利用温度缩放Sigmoid家族限制激活值的大小，从而在DP-SGD训练过程中降低梯度爆炸的风险。
分析温度参数 $ T $ 与DP-SGD中裁剪范数之间的关系，表明温度缩放Sigmoid能自然地与裁剪机制对齐。
在DP-SGD中应用梯度裁剪和高斯噪声，但将ReLU替换为温度缩放Sigmoid，以减少因裁剪和噪声注入导致的信号损失。
在学习率、批量大小、优化器和训练轮数等超参数上进行广泛搜索，特别针对隐私学习条件进行调优。
在MNIST、FashionMNIST和CIFAR10上，于相同的隐私预算（$ \varepsilon, \delta $）下，比较ReLU与温度缩放Sigmoid（如tanh）的性能表现。

实验结果

研究问题

RQ1为何无界激活函数（如ReLU）会因梯度裁剪和噪声注入而降低差分隐私深度学习中的性能？
RQ2有界激活函数（如温度缩放Sigmoid）是否能减轻DP-SGD中梯度裁剪和噪声的负面影响？
RQ3温度缩放Sigmoid中的温度参数与DP-SGD中的裁剪范数之间存在何种关系？
RQ4与ReLU相比，将温度缩放Sigmoid作为默认激活函数是否能在标准基准测试中带来更优的隐私-准确率权衡？
RQ5能否从头开始为隐私保护训练设计架构选择（如激活函数），以超越对非隐私模型的后期适配？

主要发现

在DP-SGD下，温度缩放Sigmoid在MNIST上达到98.1%的测试准确率，$ \varepsilon = 2.93 $，优于仅使用ReLU的模型（最高仅96.6%）。
在FashionMNIST上，所提方法在 $ \varepsilon = 2.7 $ 下达到86.1%的准确率，而ReLU模型仅为81.9%，显著改善了隐私-准确率权衡。
在CIFAR10上，使用温度缩放Sigmoid的模型在 $ \varepsilon = 7.53 $ 下达到66.2%的准确率，超过ReLU基线模型的61.6%。
在所有三个基准测试中，性能提升均具有一致性，表明温度缩放Sigmoid在隐私深度学习中具有广泛有效性。
性能提升归因于梯度裁剪和噪声注入导致的信号损失减少，因为有界激活函数可防止梯度爆炸并保留更多有用信息。
超参数调优至关重要：学习率必须为隐私训练重新优化，且在该场景下，自适应优化器（如Adam）并未优于SGD。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。