Skip to main content
QUICK REVIEW

[论文解读] Systematic Evaluation of Privacy Risks of Machine Learning Models

Liwei Song, Prateek Mittal|arXiv (Cornell University)|Mar 24, 2020
Adversarial Robustness in Machine Learning参考文献 45被引用 84
一句话总结

本文批评了先前对成员身份推断风险的评估,提出非神经网络基准攻击,提出细粒度隐私风险评分,并表明防御效果不如宣传那样有效。它提供了评估协议和公开代码。

ABSTRACT

Machine learning models are prone to memorizing sensitive data, making them vulnerable to membership inference attacks in which an adversary aims to guess if an input sample was used to train the model. In this paper, we show that prior work on membership inference attacks may severely underestimate the privacy risks by relying solely on training custom neural network classifiers to perform attacks and focusing only on the aggregate results over data samples, such as the attack accuracy. To overcome these limitations, we first propose to benchmark membership inference privacy risks by improving existing non-neural network based inference attacks and proposing a new inference attack method based on a modification of prediction entropy. We also propose benchmarks for defense mechanisms by accounting for adaptive adversaries with knowledge of the defense and also accounting for the trade-off between model accuracy and privacy risks. Using our benchmark attacks, we demonstrate that existing defense approaches are not as effective as previously reported. Next, we introduce a new approach for fine-grained privacy analysis by formulating and deriving a new metric called the privacy risk score. Our privacy risk score metric measures an individual sample's likelihood of being a training member, which allows an adversary to identify samples with high privacy risks and perform attacks with high confidence. We experimentally validate the effectiveness of the privacy risk score and demonstrate that the distribution of privacy risk score across individual samples is heterogeneous. Finally, we perform an in-depth investigation for understanding why certain samples have high privacy risks, including correlations with model sensitivity, generalization error, and feature embeddings. Our work emphasizes the importance of a systematic and rigorous evaluation of privacy risks of machine learning models.

研究动机与目标

  • 评估来自超越神经网络攻击者的成员身份推断攻击的隐私风险。
  • 引入非神经网络基准攻击以及受真实标签启发的基于熵的攻击,以衡量隐私风险。
  • 提出一个细粒度的隐私风险评分,以评估单样本的风险。
  • 在自适应/对抗性设置下评估现有防御措施。
  • 提供可访问的基准测试和代码,以实现可重复的隐私风险评估。

提出的方法

  • 利用非神经网络基的推断攻击进行基准测试,包括类别相关阈值和改进的预测熵攻击。
  • 引入一种新的改进指标,修改的预测熵(Mentr),以更好地捕捉真实标签信息。
  • 使用影子训练来为基于指标的攻击设定类别特定阈值。
  • 在自适应对手下评估防御,并与提前停止基线进行比较。
  • 提出并计算单个样本的隐私风险评分,以揭示风险的异质性。

实验结果

研究问题

  • RQ1在受保护模型上,非NN基攻击是否比基于NN的攻击揭示更高的成员推断风险?
  • RQ2类别特定阈值和修改后的熵度量如何影响攻击有效性?
  • RQ3单样本隐私风险评分是否能揭示训练样本之间隐私风险的异质性?
  • RQ4在自适应/对抗性评估下,现有防御(如对抗性正则化、MemGuard)是否鲁棒?
  • RQ5应如何标准化隐私风险评估以在模型准确性与隐私之间取得平衡?

主要发现

防御方法数据集报告的攻击准确率我们的基准攻击准确率
Adversarial regularization [31]Purchase10051.6%59.5%
Adversarial regularization [31]Texas10051.0%58.6%
MemGuard [20]Location3050.1%69.1%
MemGuard [20]Texas10050.3%74.2%
  • 非NN基准攻击相较于先前基于NN的评估显著提高了推断出的隐私风险(例如 58.6%–74.2% 对 ~50%)。
  • 在自适应威胁下,诸如对抗性正则化和MemGuard等防御方法提供有限的隐私保护,未始终优于提前停止。
  • 修改的预测熵(Mentr)攻击的表现优于标准熵基攻击。
  • 隐私风险在样本之间呈异质性;所提出的隐私风险评分可以识别高风险成员以进行定向成员推断。
  • 单样本风险分析补充聚合分析,以更好地理解隐私动态并指导防御评估。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。