[论文解读] Label-Only Membership Inference Attacks
本文提出仅标签的成员资格推断攻击,其通过输入扰动下预测标签的鲁棒性来推断训练数据的成员资格,与基于置信度的攻击相匹配甚至超越,并破解了对置信屏蔽的防御。
Membership inference attacks are one of the simplest forms of privacy leakage for machine learning models: given a data point and model, determine whether the point was used to train the model. Existing membership inference attacks exploit models' abnormal confidence when queried on their training data. These attacks do not apply if the adversary only gets access to models' predicted labels, without a confidence measure. In this paper, we introduce label-only membership inference attacks. Instead of relying on confidence scores, our attacks evaluate the robustness of a model's predicted labels under perturbations to obtain a fine-grained membership signal. These perturbations include common data augmentations or adversarial examples. We empirically show that our label-only membership inference attacks perform on par with prior attacks that required access to model confidences. We further demonstrate that label-only attacks break multiple defenses against membership inference attacks that (implicitly or explicitly) rely on a phenomenon we call confidence masking. These defenses modify a model's confidence scores in order to thwart attacks, but leave the model's predicted labels unchanged. Our label-only attacks demonstrate that confidence-masking is not a viable defense strategy against membership inference. Finally, we investigate worst-case label-only attacks, that infer membership for a small number of outlier data points. We show that label-only attacks also match confidence-based attacks in this setting. We find that training models with differential privacy and (strong) L2 regularization are the only known defense strategies that successfully prevents all attacks. This remains true even when the differential privacy budget is too high to offer meaningful provable guarantees.
研究动机与目标
- 在仅能获取硬标签的情况下,动机化并形式化成员资格推断威胁。
- 开发基于扰动和鲁棒性的仅标签攻击以揭示成员资格。
- 将仅标签攻击与基于置信度的攻击进行比较并评估常见防御。
- 评估标准正则化、数据增强与差分隐私对成员资格泄漏的影响。
提出的方法
- 将基线间隙攻击定义为仅使用标签信息的简单预测器。
- 引入基于(i)数据增强代理以探测成员资格、(ii)通过仅标签扰动与对抗样本式查询获得边界距离代理来获得决策边界距离代理,以及(iii)将多次查询结合以提高信号。
- 使用数据增强(旋转、平移)和边界距离度量来生成代理置信度。
- 使用仅标签对抗性步行(HopSkipJump)和基于随机化/噪声的鲁棒性测试来估计到决策边界的距离。
- 在影子模型上微调决策阈值并将其转移到目标模型。
- 在多个数据集和模型类型上评估攻击的查询成本和有效性。
实验结果
研究问题
- RQ1仅凭标签的成员资格推断能否达到甚至超过依赖完整置信得分的攻击?
- RQ2置信屏蔽防御(如 MemGuard、对抗性正则化)是否能保护免受仅标签攻击?
- RQ3仅标签攻击的查询复杂性和实际成本是多少?
- RQ4哪些防御在仅标签和基于置信度的攻击下能有效缓解成员资格泄漏?
- RQ5标准正则化技术和差分隐私如何影响仅标签成员资格泄漏?
主要发现
- 仅标签攻击在若干数据集上与置信向量攻击相匹配,且在联合使用时甚至优于它们。
- 像 MemGuard 和对抗性正则化这样的置信屏蔽防御对仅标签攻击无效。
- 训练阶段的数据增强可能增加对仅标签攻击的泄漏,即使它降低过拟合并提升准确率。
- 强的 L2 正则化或差分隐私训练可以显著降低泄漏,但通常需以准确率作为权衡。
- 迁移学习在某些设置下可降低泄漏;完全微调可能增加泄漏,而仅最后一层的微调往往降低泄漏。
- 大约几千次查询的预算就能获得强烈的 MI 信号;甚至微小的扰动(旋转、平移)也会产生非平凡的泄漏。
- 讨论了异常点 MI 与最差情况输入泄漏,强调保护需要超越对置信度的屏蔽的防御。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。