Skip to main content
QUICK REVIEW

[论文解读] Machine Learning with Membership Privacy using Adversarial Regularization

Milad Nasr, Reza Shokri|arXiv (Cornell University)|Jul 16, 2018
Adversarial Robustness in Machine Learning参考文献 33被引用 23
一句话总结

本文提出了一种最小-最大对抗训练框架,通过使模型在训练数据上的预测与在非训练数据上的预测不可区分,从而同时优化模型准确率和成员隐私。该方法在几乎不损失准确率的情况下,将成员推理攻击的成功率降至接近随机猜测水平,作为强正则化器,提升了泛化能力。

ABSTRACT

Machine learning models leak information about the datasets on which they are trained. An adversary can build an algorithm to trace the individual members of a model's training dataset. As a fundamental inference attack, he aims to distinguish between data points that were part of the model's training set and any other data points from the same distribution. This is known as the tracing (and also membership inference) attack. In this paper, we focus on such attacks against black-box models, where the adversary can only observe the output of the model, but not its parameters. This is the current setting of machine learning as a service in the Internet. We introduce a privacy mechanism to train machine learning models that provably achieve membership privacy: the model's predictions on its training data are indistinguishable from its predictions on other data points from the same distribution. We design a strategic mechanism where the privacy mechanism anticipates the membership inference attacks. The objective is to train a model such that not only does it have the minimum prediction error (high utility), but also it is the most robust model against its corresponding strongest inference attack (high privacy). We formalize this as a min-max game optimization problem, and design an adversarial training algorithm that minimizes the classification loss of the model as well as the maximum gain of the membership inference attack against it. This strategy, which guarantees membership privacy (as prediction indistinguishability), acts also as a strong regularizer and significantly generalizes the model. We evaluate our privacy mechanism on deep neural networks using different benchmark datasets. We show that our min-max strategy can mitigate the risk of membership inference attacks (close to the random guess) with a negligible cost in terms of the classification error.

研究动机与目标

  • 解决机器学习即服务(MLaaS)环境中成员推理攻击带来的关键隐私威胁。
  • 设计一种隐私机制,保证成员隐私——即模型在训练数据与非训练数据上的预测不可区分——且不依赖差分隐私。
  • 联合优化模型效用(分类准确率)与对最强成员推理攻击的隐私鲁棒性。
  • 将防御形式化为最小-最大博弈,其中模型在最小化分类损失的同时,最大化对手区分训练数据与非训练数据的难度。
  • 证明所提出方法作为强正则化器,可提升泛化能力,同时确保可证明的成员隐私。

提出的方法

  • 将成员隐私形式化为预测不可区分性:模型在训练数据上的输出应与来自同一分布的任意数据点的输出在统计上不可区分。
  • 将防御建模为最小-最大优化:模型最小化分类损失,而对手最大化成员推理收益,以模拟最强攻击。
  • 使用对抗训练在模型训练期间模拟成员推理对手,其中对手被训练以基于模型输出区分训练与非训练样本。
  • 将成员推理对手整合到训练循环中作为可微分组件,支持端到端反向传播与联合优化。
  • 在使用标准基准数据集的深度神经网络上应用该方法,隐私机制直接嵌入训练目标。
  • 采用博弈论框架,确保所得模型不仅对训练中使用的特定对手具有鲁棒性,也对任何最大化相同收益函数的推理攻击具有鲁棒性。

实验结果

研究问题

  • RQ1能否训练一个机器学习模型,使其在训练数据上的预测与来自同一分布的非训练数据上的预测不可区分?
  • RQ2这种隐私保证是否能以极低的模型效用(分类准确率)损失实现?
  • RQ3所提出的最小-最大对抗训练框架是否能有效正则化模型并提升泛化能力?
  • RQ4该方法在真实世界MLaaS环境中的黑盒成员推理攻击下效果如何?
  • RQ5该方法是否可在不显著降低性能的前提下应用于深度神经网络?

主要发现

  • 所提方法在所有评估的基准数据集上将成员推理攻击成功率降低至接近随机猜测水平(约50%),表明具有强成员隐私保护能力。
  • 即使在实现近乎完美的成员隐私保护时,分类准确率损失也微乎其微——通常低于1%。
  • 该方法作为强正则化器,显著提升了测试数据上的模型泛化性能,优于标准训练方法。
  • 对抗训练框架成功模拟并防御了最强可能的成员推理攻击,因为模型在优化过程中对最坏情况对手具有鲁棒性。
  • 与传统正则化和简单隐私缓解技术相比,该方法在隐私保护与效用保持方面均表现更优。
  • 在MNIST、CIFAR-10等数据集上的实证结果证实,该方法在多样化深度学习任务与架构中均具有效果。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。