QUICK REVIEW

[논문 리뷰] Scalable Membership Inference Attacks via Quantile Regression

Martín Bertrán, Shuai Tang|arXiv (Cornell University)|2023. 07. 07.

Adversarial Robustness in Machine Learning인용 수 8

한 줄 요약

단일 모델 분위 회귀 공격을 도입하여 멤버십 추론에서 그림자 모델(attacks)과 경쟁적이면서 비용은 크게 절감되며 블랙박스 설정에서 사용 가능하다. 하나의 분위(quantile) 모델을 학습해 신뢰도-점수 분위수를 예측하고, 특히 ImageNet과 같은 대형 데이터셋에서 강력한 실제 양성 성능으로 목표 거짓 양성률을 달성한다.

ABSTRACT

Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not. Membership inference can be formalized as a hypothesis testing problem. The most effective existing attacks estimate the distribution of some test statistic (usually the model's confidence on the true label) on points that were (and were not) used in training by training many \emph{shadow models} -- i.e. models of the same architecture as the model being attacked, trained on a random subsample of data. While effective, these attacks are extremely computationally expensive, especially when the model under attack is large. We introduce a new class of attacks based on performing quantile regression on the distribution of confidence scores induced by the model under attack on points that are not used in training. We show that our method is competitive with state-of-the-art shadow model attacks, while requiring substantially less compute because our attack requires training only a single model. Moreover, unlike shadow model attacks, our proposed attack does not require any knowledge of the architecture of the model under attack and is therefore truly ``black-box". We show the efficacy of this approach in an extensive series of experiments on various datasets and model architectures.

연구 동기 및 목표

Shadow-model 기반 멤버십 추론 공격의 계산 비효율성을 동기 부여하고 해결한다.
비Training 데이터의 신뢰도-점수 분위수를 예측하기 위해 단일 모델을 사용하는 분위 회귀 접근법을 제시한다.
제안된 방법이 모델 및 아키텍처에 구애받지 않는 블랙박스 공격을 가능하게 함을 보인다.
공격이 원하는 거짓 긍정률(FPR)을 달성하고 그룹 조건부 분위수 일관성을 탐구하는 이론적 보장을 제공한다.

제안 방법

테스트 통계량 s(x,y)를 실제 레이블에 대한 신뢰-로짓 격차로 정의한다.
(x, s(x,y))에서 분위 회귀 모델 q를 학습하여 s의 (1-α) 분위수를 x를 주었을 때 예측한다(핀볼 손실을 최소화).
공격 A_q를 구성하여 s(x,y) ≥ q(x)일 때 학습-멤버십을 표시하고 그렇지 않으면 표시하지 않는다.
완만한 조건 하에서 FPR(A_q) = α가 되도록 적합한 모델 클래스가 시프트에 닫혀 있음을 증명한다.
α를 변화시키면 FPR과 TPR 사이의 ROC 무역-곡선이 형성됨을 보인다.
모델-아그노스틱함을 시연한다: 공격은 f로부터 신뢰도 점수를 얻기 위한 API 접근만 필요하고 대상 모델의 아키텍처 지식은 필요로 하지 않는다.

Figure 1 : Comparing the true positive rate vs. false positive rate of our membership inference attack with the marginal baseline proposed in Yeom et al. ( 2018 ) and the state-of-the-art LiRA proposed in Carlini et al. ( 2022 ) evaluated at 2, 4, 6, and 8 shadow models. We also provide a visual rea

실험 결과

연구 질문

RQ1단일 분위 회귀 모델이 그림자 모델 기반 멤버십 추론을 모방하거나 능가하면서 계산 비용을 줄일 수 있는가؟
RQ2대규모 데이터셋(ImageNet-1k)과 소규모 데이터셋(CIFAR-10/100) 및 서로 다른 아키텍처에서 분위 회귀 공격은 어떻게 성능을 발휘하는가?
RQ3핀볼 손실 최적화가 대상 거짓 긍정률과 다양한 설정에서 강건한 실제 양성률을 제공하는가?
RQ4표형 데이터 상황에서 공격이 효과적이며 공격자가 대상 모델에 대한 지식이 제한적일 때도 동작하는가?

주요 결과

방법	C-10	C-100	IN-1k	C-10	C-100	IN-1k
Marginal	48.56%	58.81%	47.62%	60.94%	65.75%	46.81%
LIRA (n=2)	78.55%	95.21%	62.70%	83.18%	98.65%	56.04%
LIRA (n=4)	80.52%	95.87%	89.11%	91.48%	98.94%	95.18%
LIRA (n=6)	83.19%	96.20%	93.74%	93.17%	99.02%	98.38%
LIRA (n=8)	83.00%	96.07%	94.57%	93.70%	98.98%	98.73%
Ours	62.95%	79.57%	97.45%	64.48%	85.41%	99.64%

분위 회귀 공격은 최첨단 그림자 모델 접근법과 경쟁적이며, ImageNet-1k 실험에서 모든 평가 포인트에서 그림자 모델 방법보다 우수한 성능을 보인다.
공격은 단일 모델만 학습하면 되며 아키텍처에 구애받지 않아 진정한 블랙박스 적용이 가능하다.
CIFAR-10/100에서 공격은 주변 큰 기준선 대비 향상을 보이나 데이터 크기와 모델 복잡도에 따라 그림자 모델 방법에 뒤처질 수 있다.
대형 데이터셋(ImageNet-1k)에서 아주 낮은 거짓 긍정률에서 높은 정밀도를 달성한다(예: 표 1에 주목할 만한 이점이 나타남).
표형 데이터셋의 경우 단일 모델 접근이 LiRA의 성능과 일치하면서도 계산 비용이 크게 낮다(하나의 모델 vs 다수).
핀볼 손실 최소화가 작업 전반에서 멤버십 추론 성능과 양의 상관관계를 가진다는 것을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.