QUICK REVIEW

[논문 리뷰] Scalable and Differentially Private Distributed Aggregation in the Shuffled Model

Badih Ghazi, Rasmus Pagh|arXiv (Cornell University)|2019. 06. 19.

Privacy-Preserving Technologies in Data참고 문헌 15인용 수 68

한 줄 요약

이 논문은 polylogarithmic 통신 및 오차 증가를 가진 섞은 모델에서 프라이빗 분산 집계를 위한 확장 가능한 프로토콜을 제시하고, 제로합 노이즈의 은닉 망(Invisibility Cloak)을 이용한다.

ABSTRACT

Federated learning promises to make machine learning feasible on distributed, private datasets by implementing gradient descent using secure aggregation methods. The idea is to compute a global weight update without revealing the contributions of individual users. Current practical protocols for secure aggregation work in an "honest but curious" setting where a curious adversary observing all communication to and from the server cannot learn any private information assuming the server is honest and follows the protocol. A more scalable and robust primitive for privacy-preserving protocols is shuffling of user data, so as to hide the origin of each data item. Highly scalable and secure protocols for shuffling, so-called mixnets, have been proposed as a primitive for privacy-preserving analytics in the Encode-Shuffle-Analyze framework by Bittau et al., which was later analytically studied by Erlingsson et al. and Cheu et al.. The recent papers by Cheu et al., and Balle et al. have given protocols for secure aggregation that achieve differential privacy guarantees in this "shuffled model". Their protocols come at a cost, though: Either the expected aggregation error or the amount of communication per user scales as a polynomial $n^{Ω(1)}$ in the number of users $n$. In this paper we propose simple and more efficient protocol for aggregation in the shuffled model, where communication as well as error increases only polylogarithmically in $n$. Our new technique is a conceptual "invisibility cloak" that makes users' data almost indistinguishable from random noise while introducing zero distortion on the sum.

연구 동기 및 목표

개인 입력을 노출하지 않고 분산 데이터에서의 프라이빗 합계 추정.
n에 대한 기존의 다항 의존성을 넘어서 섞은 모델 기반 집계의 규모 확장을 개선한다.
사용자당 낮은 통신량과 낮은 집계 오차를 가진 프로토콜을 개발한다.
섞은 모델에서 신뢰할 수 없는 또는 공모하는 사용자에 대한 강건성 인사이트를 제공한다.

제안 방법

각 x_i를 스케일링 및 모듈러 산술에 의해 합이 x_i가 되도록 m개의 임의 값 세트로 변환하는 Invisibility Cloak Encoder를 제안한다.
암호화를 섞은 출력들을 무작위로 섞어 차등 프라이버시를 가능하게 한다.
Analyzer가 섞인 출력을 모아 모듈러 감소를 적용해 실제 합을 추정한다.
최종 합을 보존하면서 개별 입력을 숨기는 제로-합 노이즈 기법을 도입한다.
두 가지 프라이버시 개념을 제공한다: 단일 사용자의 변경과 합계 보존(이산화 후 합계) 변경.
섞은 모델에 대한 이산 프라이버시 분석 프레임워크를 제시하고, (ε, δ)-DP를 달성하는 매개변수 설정을 도출한다.

실험 결과

연구 질문

RQ1섞은 모델에서의 집계가 n에서 다항로그(per-user) 통신 및 오차로 차등 프라이버시를 달성할 수 있어, past n^{Ω(1)} 장벽을 깨뜨릴 수 있는가?
RQ2제로-합 노이즈를 사용해 개별 입력을 숨기되, 섞은 프라이버시 체계에서 전체 합은 왜곡 없이 보존될 수 있는가?
RQ3합계 보존 변경과 단일 사용자 변경 하에서의 프라이버시 보장은 무엇이며, 인코더 매개변수가 이러한 보장에 어떤 영향을 미치는가?
RQ4합치된 또는 신뢰할 수 없는 사용자에 대한 프라이버시 보장은 섞은 모델 프레임워크 내에서 얼마나 강건한가?
RQ5실용적인 프라이버시 및 정확도 경계가 나오는 매개변수 범위(m, N, k, ε, δ)는 무엇인가?

주요 결과

섞은 모델 프로토콜이 기대 오차 O(1/ε · sqrt(log(1/δ)))와 사용자당 통신 O(log(n/(εδ))) 개의 메시지(크기 O(log(n/δ)))로 존재한다.
합계 보존 변경 하에서의 최악의 오차가 2^{-m}이고 사용자당 통신이 메시지 m개, 크기가 O(m)인 프로토콜이 있다.
은닉 망 기법은 각 사용자의 데이터를 거의 무작위로 보이게 하면서도 전체 합은 보존하여 최종 왜곡 없이 DP를 가능하게 한다.
이 접근법은 사용자 수 증가에 대해 통신과 오차 모두에서 거의 선형 확장을 달성하며, 기존 연구의 n^{Ω(1)} 요인을 피한다.
본 연구는 합계 보존 변경 하에서 DP 보장을 확립하고, 공모 또는 신뢰할 수 없는 사용자에 대한 강건성도 논의한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.