QUICK REVIEW

[논문 리뷰] Privacy Risks of Explaining Machine Learning Models.

Reza Shokri, Martin Strobel|arXiv (Cornell University)|2019. 06. 29.

Adversarial Robustness in Machine Learning참고 문헌 24인용 수 23

한 줄 요약

이 논문은 기계학습 모델 설명 방식—특히 기울기 기반 기여도와 影響 측정치—가 민감한 훈련 데이터 정보를 泄露할 수 있으며, 이로 인해 소속성 공격 및 재구성 공격가능성을 분석한다. 연구는 이러한 설명 방식이 소수자 및 이질적 데이터에 특히 위험을 초래함을 입증하며, 투명성 메커니즘이 의도하지 않게 데이터 프라이버시를 침해할 수 있음을 보여준다.

ABSTRACT

Can an adversary exploit model explanations to infer sensitive information about the models' training set? To investigate this question, we first focus on membership inference attacks: given a data point and a model explanation, the attacker's goal is to decide whether or not the point belongs to the training data. We study this problem for two popular transparency methods: gradient-based attribution methods and record-based influence measures. We develop membership inference attacks based on these model explanations, and extensively test them on a variety of datasets. For gradient-based methods, we show that the explanations can leak a significant amount of information about the individual data points in the training set, much beyond what is leaked through the predicted labels. We also show that record-based measures can be effectively, and even more significantly, exploited for membership inference attacks. More importantly, we design reconstruction attacks against this class of model explanations. We demonstrate that they can be exploited to recover significant parts of the training set. Finally, our results indicate that minorities and outliers are more vulnerable to these type of attacks than the rest of the population. Thus, there is a significant disparity for the privacy risks of model explanations across different groups.

연구 동기 및 목표

모델 설명이 훈련 데이터에 대한 민감한 정보를 유추하는 데 악용될 수 있는지 조사하기 위해.
기울기 기반 기여도 및 영향 측정치를 사용한 소속성 공격의 효과성을 평가하기 위해.
모델 설명에서 훈련 데이터를 재구성하는 공격를 탐색하기 위해.
특히 소수자 및 이질적 데이터에 대해 다양한 인구 집단 간 프라이버시 위험 격차를 분석하기 위해.
모델 해석 기법에 내재된 프라이버시의 상충관계를 부각하기 위해.

제안 방법

훈련 데이터에 포함되었는지 여부를 판단하기 위해 기울기 기반 기여도 방법을 사용한 소속성 공격를 개발하였다.
기록 기반 영향 측정치를 기반으로 한 소속성 공격를 설계하여 훈련 세트 소속 여부를 평가하였다.
모델 설명을 활용해 훈련 데이터의 상당 부분을 복원할 수 있는 재구성 공격를 제안하였다.
일반화성과 효과성을 평가하기 위해 다양한 데이터셋에서 공격를 평가하였다.
모델 예측 값만으로부터의 프라이버시 泄露와 비교하여 모델 설명으로부터의 프라이버시 泄露를 분석하였다.
소수자 및 이질적 데이터를 중심으로 데이터 하위군 간 취약성 차이를 분석하였다.

실험 결과

연구 질문

RQ1기울기 기반 모델 설명을 사용하여 소속성 공격를 구성할 수 있는가?
RQ2영향 기반 설명은 훈련 데이터 소속 여부에 대해 얼마나 많은 정보를 泄露하는가?
RQ3모델 설명에서 훈련 데이터를 재구성할 수 있으며, 얼마나 정확하게 재구성할 수 있는가?
RQ4소수자 또는 이질적 데이터와 같은 특정 데이터 하위군이 이러한 공격에 더 취약한가?
RQ5모델 설명으로부터의 프라이버시 위험은 모델 예측 값만으로부터의 위험과 비교해 어떻게 다를까?

주요 결과

기울기 기반 기여도는 모델 예측 값만으로는 유의미하게 더 많은 개인 훈련 포인트 정보를 泄露한다.
영향 기반 설명은 기울기 기반 방법보다 소속성 공격에 더 효과적이다.
재구성 공격는 모델 설명을 활용해 훈련 데이터의 상당 부분을 복원할 수 있다.
소수자 및 이질적 데이터는 소속성 공격 및 재구성 공격에 비례하여 더 취약하다.
모델 설명으로 인한 프라이버시 위험은 균일하게 분포하지 않으며, 다양한 인구 집단 간 노출 격차를 초래한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.