QUICK REVIEW

[论文解读] White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

Alexandre Sablayrolles, Matthijs Douze|arXiv (Cornell University)|Aug 29, 2019

Adversarial Robustness in Machine Learning被引用 113

一句话总结

本论文推导出贝叶斯最优的成员资格推断策略，表明在温和假设下，黑盒（基于损失的）攻击可以匹配白盒攻击，并提供在 CIFAR-10 和 ImageNet 上优于现有方法的实用近似。

ABSTRACT

Membership inference determines, given a sample and trained parameters of a machine learning model, whether the sample was part of the training set. In this paper, we derive the optimal strategy for membership inference with a few assumptions on the distribution of the parameters. We show that optimal attacks only depend on the loss function, and thus black-box attacks are as good as white-box attacks. As the optimal strategy is not tractable, we provide approximations of it leading to several inference methods, and show that existing membership inference methods are coarser approximations of this optimal strategy. Our membership attacks outperform the state of the art in various settings, ranging from a simple logistic regression to more complex architectures and datasets, such as ResNet-101 and Imagenet.

研究动机与目标

在白盒和黑盒设置下激励并形式化成员资格推断。
推导贝叶斯最优成员资格推断策略，并表明其仅依赖损失，而非模型参数。
开发实用、易处理的近似（MAST、MALT、MATT）并将其与差分隐私概念联系起来。
在 CIFAR-10 和 ImageNet 上进行实证验证攻击，并与最先进方法进行比较。

提出的方法

将训练过程建模为带温度 T 的参数后验（似然度与 exp(-1/T ∑ 损失) 成正比）。
使用贝叶斯推理推导贝叶斯最优成员资格概率 M(θ, z1)，表明其仅通过一个分数 s 和一个校准项 τ 来依赖损失。
引入对最优分数的显式近似：MAST（针对每个样本的校准 τ(z1)），MALT（常数 τ），以及 MATT（基于泰勒比展开的近似）。
将结果与差分隐私联系起来，给出 ε-差分隐私和 (ε, δ)-成员隐私保证及对训练过程的影响。
将理论转化为实际的攻击算法，并讨论基线（0-1）和影子模型。
在带有 CNN 特征的逻辑回归（CIFAR-10）、小型 CNN 和大模型（ImageNet）上，在不同数据增强下评估攻击。

实验结果

研究问题

RQ1在给定模型参数和目标样本的情况下，贝叶斯最优的成员资格策略是什么？
RQ2贝叶斯最优攻击是否除了目标样本的损失之外，还依赖于模型参数？
RQ3实用、可处理的近似（MAST、MALT、MATT）是否在多种设置下优于现有的成员资格推断方法？
RQ4数据增强和模型规模如何影响在 CIFAR-10 和 ImageNet 等现实数据集上的成员资格推断攻击强度？

主要发现

最优成员资格推断仅依赖于损失，而不依赖分类器参数，这意味着白盒攻击在渐近意义上并不优于黑盒攻击。
三种明确的近似（MAST、MALT、MATT）在多种情境下提供了超过先前方法（0-1 基线和影子模型）的实用攻击。
在使用简单逻辑回归的 CIFAR-10 上，MALT 和 MATT 的攻击准确率高于 0-1 和影子模型攻击，且 MATT 往往最强。
在 ImageNet（VGG-16 和 ResNet-101）上，数据增强缩小了攻击差距，但贝叶斯最优和 MALT 攻击在未进行增强时仍可达到显著的成员隐私泄露（约 90% 的准确率），并在有增强时仍保持超过 64%。
在 CIFAR-10 和 ImageNet 上的实验表明，所提出的攻击在模型复杂度和数据情境下都有效。
该框架将成员资格推断与差分隐私联系起来，在若干假设下给出明确的 ε-差分隐私保证和 ε, δ-成员隐私界限。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。