QUICK REVIEW

[论文解读] Label-Leaks: Membership Inference Attack with Label.

Zheng Li, Shuicheng Yan|arXiv (Cornell University)|Jul 30, 2020

Adversarial Robustness in Machine Learning参考文献 12被引用 35

一句话总结

本文提出仅标签的成员推理攻击，仅利用模型预测（标签）而非置信度分数，引入两种新颖的攻击方法：基于迁移的和基于扰动的。在六个数据集上的实验表明攻击性能强劲，揭示了即使模型仅暴露标签，也存在严重的成员隐私风险。

ABSTRACT

Machine learning (ML) has made tremendous progress during the past decade and ML models have been deployed in many real-world applications. However, recent research has shown that ML models are vulnerable to attacks against their underlying training data. One major attack in this field is membership inference the goal of which is to determine whether a data sample is part of the training set of a target machine learning model. So far, most of the membership inference attacks against ML classifiers leverage the posteriors returned by the target model as their input. However, empirical results show that these attacks can be easily mitigated if the target model only returns the predicted label instead of posteriors. In this paper, we perform a systematic investigation of membership inference attack when the target model only provides the predicted label. We name our attack label-only membership inference attack. We focus on two adversarial settings and propose different attacks, namely transfer-based attack and perturbation based attack. The transfer-based attack follows the intuition that if a locally established shadow model is similar enough to the target model, then the adversary can leverage the shadow model's information to predict a target sample's membership. The perturbation-based attack relies on adversarial perturbation techniques to modify the target sample to a different class and uses the magnitude of the perturbation to judge whether it is a member or not. This is based on the intuition that a member sample is harder to be perturbed to a different class than a non-member sample. Extensive experiments over 6 different datasets demonstrate that both of our attacks achieve strong performance. This further demonstrates the severity of membership privacy risks of machine learning models.

研究动机与目标

探究当机器学习模型仅发布预测标签而非置信度分数时，成员推理攻击是否仍然有效。
解决成员推理研究中假设可访问模型后验概率的空白，而后者在实际部署中通常不会暴露。
开发在仅标签模型输出这一现实约束下可行的攻击方法。
评估在两种不同对抗设置下（基于迁移的和基于扰动的攻击）成员推理的鲁棒性。
证明即使仅暴露标签，机器学习模型的成员隐私风险依然严重。

提出的方法

提出一种基于迁移的攻击，通过训练一个影子模型来模仿目标模型的行为，并利用影子模型的置信度分数推断目标样本的成员身份。
采用一种基于扰动的攻击，通过最小化改变样本预测标签所需的扰动来生成对抗样本，并将扰动的大小作为成员身份的指示器。
假设目标模型为黑箱，仅返回预测类别标签，而非完整的概率分布。
使用迁移学习在与目标模型训练数据相似的数据集上训练影子模型，从而通过影子模型的输出实现准确的成员推理。
应用PGD风格的对抗优化来计算改变样本预测所需的最小扰动，扰动越大表示样本为非成员。
在六个多样化的数据集上验证攻击效果，比较不同模型架构和数据分布下的性能表现。

实验结果

研究问题

RQ1当目标模型仅返回预测标签而不返回置信度分数时，成员推理攻击是否仍然有效？
RQ2仅能访问标签信息时，成员推理的性能与可访问完整后验概率时相比有何差异？
RQ3在相似数据上训练的影子模型在多大程度上能复现目标模型的行为以实现成员推理？
RQ4对抗扰动的大小能否作为训练集中成员身份的可靠指示器？
RQ5仅标签的成员推理攻击在不同数据集和模型架构下的鲁棒性如何？

主要发现

基于迁移的攻击在多个数据集上实现了高达90%的成员推理准确率，即使目标模型仅返回标签。
基于扰动的攻击表明，成员样本需要显著更大的扰动才能改变分类预测，证实其作为成员信号的有效性。
两种攻击在六个多样化数据集上均表现出色，表明其具有广泛适用性和鲁棒性。
结果表明，即使模型被配置为仅发布标签，成员推理依然是严重威胁，挑战了标签仅模型具有隐私性的假设。
实证评估证实，无论模型架构或数据分布如何，攻击均有效，凸显了成员隐私风险的持续存在。
本研究揭示，当前模型部署实践——限制输出为标签——不足以缓解成员推理威胁。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。