QUICK REVIEW

[논문 리뷰] ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models

Yugeng Liu, Rui Wen|arXiv (Cornell University)|2021. 02. 04.

Adversarial Robustness in Machine Learning참고 문헌 67인용 수 46

한 줄 요약

ML-Doctor는 다수의 아키텍처와 데이터셋에 걸쳐 멤버십 추론, 모델 반전, 속성 추론, 모델 절도에서 프라이버시 위험을 평가하기 위한 전체적이고 모듈식 프레임워크를 제공하며, DP-SGD 및 knowledge distillation과 같은 방어책을 포함합니다.

ABSTRACT

Inference attacks against Machine Learning (ML) models allow adversaries to learn sensitive information about training data, model parameters, etc. While researchers have studied, in depth, several kinds of attacks, they have done so in isolation. As a result, we lack a comprehensive picture of the risks caused by the attacks, e.g., the different scenarios they can be applied to, the common factors that influence their performance, the relationship among them, or the effectiveness of possible defenses. In this paper, we fill this gap by presenting a first-of-its-kind holistic risk assessment of different inference attacks against machine learning models. We concentrate on four attacks -- namely, membership inference, model inversion, attribute inference, and model stealing -- and establish a threat model taxonomy. Our extensive experimental evaluation, run on five model architectures and four image datasets, shows that the complexity of the training dataset plays an important role with respect to the attack's performance, while the effectiveness of model stealing and membership inference attacks are negatively correlated. We also show that defenses like DP-SGD and Knowledge Distillation can only mitigate some of the inference attacks. Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models, and equally serves as a benchmark tool for researchers and practitioners.

연구 동기 및 목표

Provide a comprehensive taxonomy of threat models for inference attacks on ML models.
Quantify how dataset complexity and model overfitting influence attack performance.
Explore relationships among different inference attacks and defenses across architectures and datasets.
Deliver a modular, reusable framework (ML-Doctor) to benchmark attacks and defenses for researchers and model owners.

제안 방법

Define a two-dimensional threat model taxonomy (model access: white-box/black-box; auxiliary data: partial/shadow/none).
Formalize four inference attacks (membership inference, model inversion, attribute inference, model stealing) under various threat models.
Conduct extensive empirical evaluation on five model architectures and four image datasets, analyzing attack performance and defense effectiveness.
Implement ML-Doctor as a modular framework with data processing, attack, defense, and evaluation modules.
Use shadow models and auxiliary data to train attack models for membership inference and related attacks.
Assess defenses such as DP-SGD and Knowledge Distillation across attacks to determine defense coverage and limitations.

실험 결과

연구 질문

RQ1RQ1: What is the impact of dataset complexity on different attacks?
RQ2RQ2: What is the impact of overfitting on different attacks?
RQ3RQ3: What is the relationship among different attacks?

주요 결과

CelebA	FMNIST	STL10	UTKFace
1.000 / 0.680	1.000 / 0.884	1.000 / 0.522	1.000 / 0.792
1.000 / 0.742	1.000 / 0.909	1.000 / 0.524	1.000 / 0.852
1.000 / 0.734	1.000 / 0.905	1.000 / 0.587	1.000 / 0.834
1.000 / 0.735	1.000 / 0.916	1.000 / 0.574	1.000 / 0.846
1.000 / 0.707	1.000 / 0.903	1.000 / 0.517	1.000 / 0.818

Dataset complexity strongly affects membership inference, model inversion, and model stealing; membership inference benefits from more complex datasets, while the opposite is often true for model stealing.
There is a negative correlation between membership inference success and model stealing success (r = -0.821) driven by overfitting effects.
White-box access generally yields stronger attack performance than black-box access across attacks.
DP-SGD can mitigate membership inference with limited impact on model utility; Knowledge Distillation helps but is less effective for some attacks.
Partial auxiliary data does not significantly improve attack performance for membership inference, attribute inference, or model stealing across evaluated settings.
Model stealing achieves higher agreement on simpler datasets (e.g., FMNIST) than on complex ones (e.g., STL10) due to overfitting dynamics.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.