QUICK REVIEW

[논문 리뷰] Systematic Evaluation of Privacy Risks of Machine Learning Models

Liwei Song, Prateek Mittal|arXiv (Cornell University)|2020. 03. 24.

Adversarial Robustness in Machine Learning참고 문헌 45인용 수 84

한 줄 요약

이 논문은 이전의 멤버십 추론 위험 평가를 비판하고, 비-NN 벤치마크 공격을 도입하며, 샅샅이 파악된 프라이버시 위험 점수(정밀한 수준의 프라이버시 위험 점수)를 제안하고, 방어책이 주장만큼 효과적이지 않음을 보여준다. 평가 프로토콜과 공개 코드를 제공한다.

ABSTRACT

Machine learning models are prone to memorizing sensitive data, making them vulnerable to membership inference attacks in which an adversary aims to guess if an input sample was used to train the model. In this paper, we show that prior work on membership inference attacks may severely underestimate the privacy risks by relying solely on training custom neural network classifiers to perform attacks and focusing only on the aggregate results over data samples, such as the attack accuracy. To overcome these limitations, we first propose to benchmark membership inference privacy risks by improving existing non-neural network based inference attacks and proposing a new inference attack method based on a modification of prediction entropy. We also propose benchmarks for defense mechanisms by accounting for adaptive adversaries with knowledge of the defense and also accounting for the trade-off between model accuracy and privacy risks. Using our benchmark attacks, we demonstrate that existing defense approaches are not as effective as previously reported. Next, we introduce a new approach for fine-grained privacy analysis by formulating and deriving a new metric called the privacy risk score. Our privacy risk score metric measures an individual sample's likelihood of being a training member, which allows an adversary to identify samples with high privacy risks and perform attacks with high confidence. We experimentally validate the effectiveness of the privacy risk score and demonstrate that the distribution of privacy risk score across individual samples is heterogeneous. Finally, we perform an in-depth investigation for understanding why certain samples have high privacy risks, including correlations with model sensitivity, generalization error, and feature embeddings. Our work emphasizes the importance of a systematic and rigorous evaluation of privacy risks of machine learning models.

연구 동기 및 목표

신경망 기반 공격자를 넘어선 멤버십 추론 공격으로부터 프라이버시 위험을 평가한다.
비-NN 벤치마크 공격과 진정한 라벨 정보 측정을 위한 엔트로피 기반의 공격을 도입한다.
샘플별 위험을 평가하기 위한 정밀한 프라이버시 위험 점수를 제안한다.
적응적/적대적 설정에서 기존 방어책을 평가한다.
재현 가능한 프라이버시 위험 평가를 위한 접근 가능한 벤치마크와 코드를 제공한다.

제안 방법

클래스 의존 임계치와 수정된 예측 엔트로피 기반 공격을 포함한 비-NN 기반 추론 공격으로 벤치마크를 수행한다.
Ground-truth 라벨 정보를 더 잘 포착하기 위한 새로운 정교화 지표인 수정된 예측 엔트로피(Mentr)를 도입한다.
metric 기반 공격을 위한 클래스별 임계치를 설정하기 위해 섀도우-트레이닝을 사용한다.
적응적 대적자하에서 방어책을 평가하고 조기 중단 기준과 비교한다.
개별 샘플에 대한 프라이버시 위험 점수를 제시하고 계산하여 위험의 이질성을 드러낸다.

실험 결과

연구 질문

RQ1비-NN 기반 공격이 방어된 모델에서 NN 기반 공격보다 더 높은 멤버십 추론 위험을 드러내는가?
RQ2클래스별 임계치와 수정된 엔트로피 지표가 공격 효과성에 어떤 영향을 미치는가?
RQ3샘플별 프라이버시 위험 점수가 학습 샘플 간의 프라이버시 위험의 이질성을 드러낼 수 있는가?
RQ4적대적 규제, MemGuard와 같은 기존 방어책이 적응적/적대적 평가에서 견고한가?
RQ5모델 정확도와 프라이버시의 균형을 맞추기 위한 프라이버시 위험 평가를 표준화하려면 어떻게 해야 하는가?

주요 결과

defense method	dataset	reported attack acc	our benchmark attack acc
Adversarial regularization [31]	Purchase100	51.6%	59.5%
Adversarial regularization [31]	Texas100	51.0%	58.6%
MemGuard [20]	Location30	50.1%	69.1%
MemGuard [20]	Texas100	50.3%	74.2%

비-NN 벤치마크 공격이 이전의 NN 기반 평가에 비해 추론된 프라이버시 위험을 크게 높인다(예: 58.6%–74.2% 대 ~50%).
적응적 위협 아래의 방어 방법들(예를 들어 적대적 규제 및 MemGuard)은 제한적인 프라이버시 보호를 제공하며 초기 중단보다 일관되게 우수하지 않다.
수정된 예측 엔트로피(Mentr) 공격이 표준 엔트로피 기반 공격보다 우수하다.
프라이버시 위험은 샘플 간에 이질적이며, 제안된 프라이버시 위험 점수로 높은 위험 멤버를 식별할 수 있다.
샘플별 위험 분석은 집계 분석을 보완해 프라이버시 역학을 더 잘 이해하고 방어 평가를 안내한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.