QUICK REVIEW

[논문 리뷰] Differentially Private Testing of Identity and Closeness of Discrete Distributions

Jayadev Acharya, Ziteng Sun|arXiv (Cornell University)|2018. 01. 01.

Privacy-Preserving Technologies in Data인용 수 30

한 줄 요약

이 논문은 $(\varepsilon, \delta)$-차별적 비밀유지 하에서 $k$ 원소로 이루어진 이산 분포에 대한 차별적 비밀유지 정체성 테스팅과 가까움 테스팅의 최적 표본 복잡도 한계를 확립한다. 비밀성 감소가 낮은 비민감한 비민감 추정기의 비밀성 보장 프레임워크를 제안하고, 커플링과 레캠의 이점 정리 기반으로 희소 영역에서의 가까움 테스팅에 대한 최초의 최적 하한을 제공한다.

ABSTRACT

We study the fundamental problems of identity testing (goodness of fit), and closeness testing (two sample test) of distributions over $k$ elements, under differential privacy. While the problems have a long history in statistics, finite sample bounds for these problems have only been established recently. In this work, we derive upper and lower bounds on the sample complexity of both the problems under $(\varepsilon, \delta)$-differential privacy. We provide optimal sample complexity algorithms for identity testing problem for all parameter ranges, and the first results for closeness testing. Our closeness testing bounds are optimal in the sparse regime where the number of samples is at most $k$. Our upper bounds are obtained by privatizing non-private estimators for these problems. The non-private estimators are chosen to have small sensitivity. We propose a general framework to establish lower bounds on the sample complexity of statistical tasks under differential privacy. We show a bound on differentially private algorithms in terms of a coupling between the two hypothesis classes we aim to test. By constructing carefully chosen priors over the hypothesis classes, and using Le Cam's two point theorem we provide a general mechanism for proving lower bounds. We believe that the framework can be used to obtain strong lower bounds for other statistical tasks under privacy.

연구 동기 및 목표

$(\varepsilon, \delta)$-차별적 비밀유지 하에서 $k$ 원소로 이루어진 이산 분포에 대한 차별적 비밀유지 정체성 테스팅의 최적 표본 복잡도를 결정하는 것.
$(\varepsilon, \delta)$-차별적 비밀유지 하에서 차별적 비밀유지 가까움 테스팅에 대한 첫 번째 표본 복잡도 한계를 설정하는 것.
차별적 비밀유지 통계 테스팅에서 표본 복잡도에 대한 하한을 증명하기 위한 일반적 프레임워크를 개발하는 것.
정확히 구성된 사전 분포를 사용하여 레캠의 이점 정리를 적용함으로써, 비밀성 제약이 있는 분포 테스팅에 대해 날카로운 하한을 유도하는 것.

제안 방법

저자는 감도가 낮은 비민감한 추정기를 비밀성 보장 기반으로 변환하여 정체성 및 가까움 테스팅을 위한 차별적 비밀유지 알고리즘을 구축한다.
저자는 가설 클래스 간의 커플링을 기반으로 한 일반적인 하한 프레임워크를 도입하고, 이를 비밀성 제약이 있는 테스팅 문제에 적용한다.
이 프레임워크는 특정한 가설 클래스에 대한 사전 분포를 구성함으로써 레캠의 이점 정리를 사용하여 정보 이론적 하한을 도출한다.
이 방법은 표본 수가 $k$ 이하인 희소 영역에서 하한이 날카로워지도록 보장한다.
이론적 분석은 차별적 비밀유지 제약 조건과 통계적 가설 테스팅을 결합하여 표본 복잡도를 제한한다.

실험 결과

연구 질문

RQ1$(\varepsilon, \delta)$-차별적 비밀유지 하에서 $k$ 원소로 이루어진 이산 분포에 대한 차별적 비밀유지 정체성 테스팅의 최적 표본 복잡도는 무엇인가?
RQ2$(\varepsilon, \delta)$-차별적 비밀유지 하에서 차별적 비밀유지 가까움 테스팅에 대한 첫 번째 구현 가능한 표본 복잡도 한계는 무엇인가?
RQ3차별적 비밀유지 하에서 표본 복잡도에 대한 하한을 증명하기 위한 일반적 프레임워크는 어떻게 구성할 수 있는가?
RQ4커플링 기반 기법과 레캠의 이점 정리를 조합하여 비밀성 제약이 있는 분포 테스팅에 대해 날카로운 하한을 도출할 수 있는가?

주요 결과

논문은 $(\varepsilon, \delta)$-차별적 비밀유지 하에서 모든 매개변수 범위에서 정체성 테스팅에 대한 최적 표본 복잡도 알고리즘을 제공한다.
이 논문은 차별적 비밀유지 가까움 테스팅에 대한 첫 번째 표본 복잡도 한계를 확립하였으며, 표본 수가 $k$ 이하인 희소 영역에서 최적이 되도록 한다.
제안된 하한 프레임워크는 가설 클래스에 대한 사전 분포를 구성하고 레캠의 이점 정리를 적용함으로써 날카로운 하한을 도출한다.
이 프레임워크는 두 테스팅 대상 분포 간의 커플링이 둘 다의 표본 복잡도에 본질적으로 제약을 가한다는 것을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.