QUICK REVIEW

[论文解读] Stolen Memories: Leveraging Model Memorization for Calibrated White-Box Membership Inference

K. Rustan M. Leino, Matt Fredrikson|arXiv (Cornell University)|Jun 27, 2019

Adversarial Robustness in Machine Learning被引用 68

一句话总结

本论文提出了一种白盒成员资格推断攻击，利用模型对特异性特征的记忆来实现经过校准、高精度的推断，并且评估了差分隐私等防御措施。

ABSTRACT

Membership inference (MI) attacks exploit the fact that machine learning algorithms sometimes leak information about their training data through the learned model. In this work, we study membership inference in the white-box setting in order to exploit the internals of a model, which have not been effectively utilized by previous work. Leveraging new insights about how overfitting occurs in deep neural networks, we show how a model's idiosyncratic use of features can provide evidence for membership to white-box attackers---even when the model's black-box behavior appears to generalize well---and demonstrate that this attack outperforms prior black-box methods. Taking the position that an effective attack should have the ability to provide confident positive inferences, we find that previous attacks do not often provide a meaningful basis for confidently inferring membership, whereas our attack can be effectively calibrated for high precision. Finally, we examine popular defenses against MI attacks, finding that (1) smaller generalization error is not sufficient to prevent attacks on real models, and (2) while small-$ε$-differential privacy reduces the attack's effectiveness, this often comes at a significant cost to the model's accuracy; and for larger $ε$ that are sometimes used in practice (e.g., $ε=16$), the attack can achieve nearly the same accuracy as on the unprotected model.

研究动机与目标

研究深度网络的过拟合与记忆如何通过内部特征使用泄露成员信息。
开发一种白盒MI攻击，不需要访问目标训练数据，并产生经过校准的高精度推断。
明确分析现有黑盒和白盒MI攻击的局限性，并提出提高成员预测置信度的方法。
在真实数据和合成数据集上评估提出的攻击，并研究对其的防御措施（例如差分隐私）。

提出的方法

引入一个基于证据的贝叶斯最优白盒MI攻击，利用特异性特征的使用。
在高斯朴素贝叶斯假设下为一个简单线性softmax目标推导线性贝叶斯最优攻击模型（定理1）。
当精确的D*和ˆD未知时，展示如何从代理模型获取MI参数（bayes-wb攻击，观测1）。
推广到任意分布，利用学习到的位移函数D来创建general-wb攻击。
通过对每一层的局部线性近似，将攻击扩展到深度网络（第4节）。
结合校准技术以实现高精度推断（算法3）。

实验结果

研究问题

RQ1在输出行为泛化良好时，是否仍可通过模型内部特征使用，利用对白盒访问来揭示成员信息？
RQ2攻击模型是否可以经过校准以在无需访问目标训练数据的情况下提供高精度（有信心）的成员推断？
RQ3如何利用代理模型和分布假设来近似白盒MI中的贝叶斯最优成员预测？
RQ4像差分隐私这样的实际防御是否能够在不导致模型精度损失过大的情况下，对此类白盒MI攻击提供有意义的缓解？
RQ5在保持可解释性和校准的前提下，如何将攻击扩展到深度神经网络？

主要发现

提出的白盒攻击利用特征使用中的记忆来优于先前的黑盒MI方法。
贝叶斯最优线性攻击（定理1）在高斯朴素贝叶斯假设下显示了精确的成员预测，从而实现经过校准的置信度。
当精确的分布参数不可用时，代理模型和权重位移(bayes-wb、general-wb)近似最优攻击。
校准技术允许调整决策阈值以在成员推断中实现更高的精度。
该攻击在真实数据集上仍然有效，并且仅被小ε的DP部分缓解；较大的ε通常很难提供额外保护。
攻击为评估私有学习参数选择和压力测试防御提供了实际的启发式方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。