QUICK REVIEW

[论文解读] Scalable Membership Inference Attacks via Quantile Regression

Martín Bertrán, Shuai Tang|arXiv (Cornell University)|Jul 7, 2023

Adversarial Robustness in Machine Learning被引用 8

一句话总结

简介：引入一种单模型分位数回归的成员资格推断攻击，与影子模型攻击相比具有竞争力且成本显著更低，适用于黑盒设置。它训练一个分位数模型来预测置信分数分位数，并在较大的数据集如 ImageNet 上实现目标假阳性率同时具有较强的真正阳性率表现。

ABSTRACT

Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not. Membership inference can be formalized as a hypothesis testing problem. The most effective existing attacks estimate the distribution of some test statistic (usually the model's confidence on the true label) on points that were (and were not) used in training by training many \emph{shadow models} -- i.e. models of the same architecture as the model being attacked, trained on a random subsample of data. While effective, these attacks are extremely computationally expensive, especially when the model under attack is large. We introduce a new class of attacks based on performing quantile regression on the distribution of confidence scores induced by the model under attack on points that are not used in training. We show that our method is competitive with state-of-the-art shadow model attacks, while requiring substantially less compute because our attack requires training only a single model. Moreover, unlike shadow model attacks, our proposed attack does not require any knowledge of the architecture of the model under attack and is therefore truly ``black-box". We show the efficacy of this approach in an extensive series of experiments on various datasets and model architectures.

研究动机与目标

motivate and address the computational inefficiency of shadow-model based membership inference attacks.
Propose a quantile regression approach that uses a single model to predict confidence-score quantiles for non-training data.
Show that the proposed method is model- and architecture-agnostic, enabling black-box attacks without detailed target model knowledge.
Provide theoretical guarantees that the attack attains the desired false positive rate and explore group-conditioned quantile consistency.

提出的方法

Define the test statistic s(x,y) as a confidence-logit gap for the true label.
Train a quantile regression model q on (x, s(x,y)) to predict the (1-α) quantile of s given x (minimizing pinball loss).
Construct the attack A_q that flags training-membership when s(x,y) ≥ q(x) and not otherwise.
Prove that, under mild conditions, FPR(A_q) = α for suitable model classes closed under shifts.
Show that varying α yields a ROC trade-off curve between FPR and TPR.
Demonstrate model-agnosticism: the attack requires only API access to obtain confidence scores from f, not architectural knowledge.

Figure 1 : Comparing the true positive rate vs. false positive rate of our membership inference attack with the marginal baseline proposed in Yeom et al. ( 2018 ) and the state-of-the-art LiRA proposed in Carlini et al. ( 2022 ) evaluated at 2, 4, 6, and 8 shadow models. We also provide a visual rea

实验结果

研究问题

RQ1 Can a single quantile regression model emulate or surpass shadow-model based membership inference while reducing computational cost?
RQ2 How does the quantile regression attack perform across large-scale and small-scale datasets (ImageNet-1k vs CIFAR-10/100) and different architectures?
RQ3 Does optimizing for pinball loss yield a reliable target false positive rate and robust true positive rates across settings?
RQ4 Is the attack effective in tabular data scenarios and when the attacker has limited knowledge of the target model?

主要发现

The quantile regression attack is competitive with state-of-the-art shadow-model approaches and, in ImageNet-1k experiments, outperforms shadow-model methods at all evaluated points.
The method requires training only a single model and is architecture-agnostic, enabling true black-box applicability.
On CIFAR-10/100, the attack improves over the marginal baseline but may lag behind shadow-model methods, depending on data size and model complexity.
On large datasets (ImageNet-1k), the attack achieves high precision at very low false-positive rates (e.g., Table 1 shows notable gains).
For tabular datasets, the single-model approach matches LiRA’s performance while incurring substantially lower computational cost (one model vs. many).
The results indicate that pinball loss minimization correlates with stronger membership inference performance across tasks.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。