QUICK REVIEW

[논문 리뷰] Correlated-Output Differential Privacy and Applications to Dark Pools

Chiang, James Hsin-yu, Davis Railsback|arXiv (Cornell University)|2022. 02. 05.

Privacy-Preserving Technologies in Data인용 수 8

한 줄 요약

이 논문은 다수의 데이터 소유자 간에 기밀을 유지한 채로 기계 학습 모델을 훈련시킬 수 있도록 하는 새로운 MPC+DP 프레임워크를 제안한다. 신뢰할 수 있는 중재자가 없이도, 보안 다자간 계산(MPC)을 사용해 전역적 차별적 개인정보 보호(global DP)를 위한 신뢰할 수 있는 중재자 역할을 시뮬레이션함으로써 이를 실현한다. 이 방법은 국소 DP보다 높은 모델 정확도를 달성하면서도 공식적인 개인정보 보호 보장을 유지하며, 유전체 분석 분야에서 iDASH2021 경연 대회에서 1등을 차지했다.

ABSTRACT

In the classical setting of differential privacy, a privacy-preserving query is performed on a private database, after which the query result is released to the analyst; a differentially private query ensures that the presence of a single database entry is protected from the analyst’s view. In this work, we contribute the first definitional framework for differential privacy in the trusted curator setting (Fig. 1); clients submit private inputs to the trusted curator, which then computes individual outputs privately returned to each client. The adversary is more powerful than the standard setting; it can corrupt up to n-1 clients and subsequently decide inputs and learn outputs of corrupted parties. In this setting, the adversary also obtains leakage from the honest output that is correlated with a corrupted output. Standard differentially private mechanisms protect client inputs but do not mitigate output correlation leaking arbitrary client information, which can forfeit client privacy completely. We initiate the investigation of a novel notion of correlated-output differential privacy to bound the leakage from output correlation in the trusted curator setting. We define the satisfaction of both standard and correlated-output differential privacy as round differential privacy and highlight the relevance of this novel privacy notion to all application domains in the trusted curator model. We explore round differential privacy in traditional "dark pool" market venues, which promise privacy-preserving trade execution to mitigate front-running; privately submitted trade orders and trade execution are kept private by the trusted venue operator. We observe that dark pools satisfy neither classic nor correlated-output differential privacy; in markets with low trade activity, the adversary may trivially observe recurring, honest trading patterns, and anticipate and front-run future trades. In response, we present the first round differentially private market mechanisms that formally mitigate information leakage from all trading activity of a user. This is achieved with fuzzy order matching, inspired by the standard randomized response mechanism; however, this also introduces a liquidity mismatch as buy and sell orders are not guaranteed to execute pairwise, thereby weakening output correlation; this mismatch is compensated for by a round differentially private liquidity provider mechanism, which freezes a noisy amount of assets from the liquidity provider for the duration of a privacy epoch, but leaves trader balances unaffected. We propose oblivious algorithms for realizing our proposed market mechanisms with secure multi-party computation (MPC) and implement these in the Scale-Mamba Framework using Shamir Secret Sharing based MPC. We demonstrate practical, round differentially private trading with comparable throughput as prior work implementing (traditional) dark pool algorithms in MPC; our experiments demonstrate practicality for both traditional finance and decentralized finance settings.

연구 동기 및 목표

수평적 또는 수직적 분산 데이터에서 정확한 기계 학습 모델을 훈련시키면서도 개인정보를 보존하는 데 도전하는 것.
분산된 데이터에서 훈련할 경우 순수 차별적 개인정보 보호(DP) 방법에서 내재된 정확도 손실을 해결하는 것.
신뢰할 수 있는 중재자를 제거하기 위해 보안 다자간 계산(MPC)을 활용해 전역 DP를 시뮬레이션하는 것.
이전의 MPC+DP 방법이 실패하는 의료 및 광고 분야에서 흔한 수직적 분할 데이터에 대해 개인정보 보존 훈련을 가능하게 하는 것.
전문가 설정이 필요 없이 다양한 선형 모델과 DP 메커니즘을 지원하는 일반적이고 확장 가능한 프레임워크를 제공하는 것.

제안 방법

이 방법은 원시 데이터를 드러내지 않으면서도 다수의 데이터 소유자 간에 로지스틱 회귀 모델을 공동으로 훈련시키기 위해 MPC 프로토콜을 사용한다.
비밀 공유 기반의 MPC 프로토콜(예: 덧셈 공유)을 사용하여 다수의 당사자 간에 모델 가중치를 기밀 유지 방식으로 계산한다.
모델 훈련 후, MPC를 사용해 모델 계수에 라플라스 노이즈를 추가하여 (ϵ, δ)-차별적 개인정보 보호를 만족시킨다.
노이즈 추가는 분산 방식으로 수행되며, 전역 DP에서 신뢰할 수 있는 중재자의 역할을 시뮬레이션하여 종단 간 개인정보 보장 보장을 확보한다.
이 방법은 수평적 및 수직적 데이터 분할 모두를 지원하며, 수동 및 능동적 공격자 모델 모두와 호환된다.
프레임워크는 모듈식이다: 로지스틱 회귀 훈련 프로토콜(πLR)은 다른 선형 학습기로 교체 가능하고, 라플라스 메커니즘은 (ϵ, δ)-DP를 위해 가우시안 노이즈로 교체 가능하다.

실험 결과

연구 질문

RQ1기본 데이터를 드러내지 않으면서도, 분산 모델 훈련에서 MPC를 사용해 전역 차별적 개인정보 보호를 위한 신뢰할 수 있는 중재자를 시뮬레이션할 수 있는가?
RQ2MPC와 전역 DP를 조합하면 피어 페어드 학습 환경에서 국소 DP보다 더 높은 모델 정확도를 달성할 수 있는가?
RQ3제안된 MPC+DP 프레임워크는 특징이 당사자 간에 분할된 수직적 분할 데이터를 처리할 수 있는가?
RQ4당사자 수와 공격자 위협 모델이 증가함에 따라 MPC+DP 접근 방식의 성능은 어떻게 스케일링되는가?
RQ5MPC+DP 프레임워크는 재구성 없이 다양한 선형 모델과 DP 메커니즘에 대해 일반화 가능한가?

주요 결과

MPC+DP 접근 방식은 의료 청구 데이터를 사용해 야생형 트랜스티레틴 아밀로이드 심근병증 위험을 예측하는 iDASH2021 Track III 경연에서 1등을 차지했다.
수평적 분산 환경에서는 신뢰할 수 있는 다수의 당사자와 수동 공격자 모델을 사용한 MPC 프로토콜(3명의 당사자, 32코어 VM)을 적용해 훈련 시간을 1.3분 이내로 단축시켰다.
신뢰할 수 있는 다수의 당사자와 함께 능동 공격자 모델을 적용한 경우, 4명의 당사자 설정에서 훈련이 30분 이내에 완료되어 실용성을 입증했다.
개별 데이터 소유자가 제한된 데이터를 보유한 경우, 노이즈 누적이 줄어들어 국소 DP 기반 모델보다 정확도가 높게 나타났다.
모델 공유를 명시적으로 허용하면서도 강력한 개인정보 보장 보장을 유지했으며(ϵ=1, δ=1e-5), 국소 DP는 모델 쿼리로 인해 정보 유출이 발생한다는 점에서 이점이 있다.
프레임워크는 확장 가능하다: L2-정규화가 적용된 로지스틱 회귀 모델을 지원하며, (ϵ, δ)-DP를 위해 가우시안 노이즈 메커니즘으로도 적용 가능하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.