Skip to main content
QUICK REVIEW

[Paper Review] Communication Efficient, Sample Optimal, Linear Time Locally Private Discrete Distribution Estimation.

Jayadev Acharya, Ziteng Sun|arXiv (Cornell University)|Feb 13, 2018
Privacy-Preserving Technologies in Data12 citations
TL;DR

This paper proposes Hadamard Response (HR), a communication-efficient, sample-optimal, and linear-time locally private mechanism for discrete distribution estimation under $\varepsilon$-local differential privacy. By leveraging Hadamard matrices and the Fast Walsh-Hadamard transform, HR achieves $\log k + 2$ bits communication and near-linear $O(nk)$ time complexity, outperforming prior methods like RAPPOR and subset selection by up to 100x in speed for $k=10,000$. The method ensures order-optimal sample complexity across all privacy regimes.

ABSTRACT

We consider discrete distribution estimation over $k$ elements under $\varepsilon$-local differential privacy from $n$ samples. The samples are distributed across users who send privatized versions of their sample to the server. All previously known sample optimal algorithms require linear (in $k$) communication complexity in the high privacy regime $(\varepsilon<1)$, and have a running time that grows as $n\cdot k$, which can be prohibitive for large domain size $k$. We study the task simultaneously under four resource constraints, privacy, sample complexity, computational complexity, and communication complexity. We propose \emph{Hadamard Response (HR)}, a local non-interactive privatization mechanism with order optimal sample complexity (for all privacy regimes), a communication complexity of $\log k+2$ bits, and runs in nearly linear time. Our encoding and decoding mechanisms are based on Hadamard matrices, and are simple to implement. The gain in sample complexity comes from the large Hamming distance between rows of Hadamard matrices, and the gain in time complexity is achieved by using the Fast Walsh-Hadamard transform. We compare our approach with Randomized Response (RR), RAPPOR, and subset-selection mechanisms (SS), theoretically, and experimentally. For $k=10000$, our algorithm runs about 100x faster than SS, and RAPPOR.

Motivation & Objective

  • To address the high communication and computational costs of existing local differential privacy mechanisms in the high-privacy regime ($\varepsilon < 1$).
  • To design a locally private mechanism that achieves optimal sample complexity across all privacy regimes ($\varepsilon$-LDP).
  • To reduce communication complexity to $\log k + 2$ bits per user while maintaining accuracy.
  • To achieve nearly linear $O(nk)$ running time, significantly improving upon prior $O(nk)$ algorithms with higher constants.

Proposed method

  • The proposed Hadamard Response (HR) mechanism uses a non-interactive, local privatization scheme based on Hadamard matrices.
  • Each user encodes their sample using a row of a Hadamard matrix, ensuring large Hamming distance between codewords for robust estimation.
  • The server applies the Fast Walsh-Hadamard Transform (FWHT) to efficiently decode the privatized reports and estimate the underlying distribution.
  • The method leverages the orthogonality and high-distance properties of Hadamard matrices to minimize estimation error with minimal communication.
  • The encoding and decoding processes are designed to be computationally lightweight, enabling near-linear time complexity.
  • Theoretical analysis proves that HR achieves order-optimal sample complexity for all $\varepsilon$-LDP regimes.

Experimental results

Research questions

  • RQ1Can a locally private distribution estimation mechanism achieve both optimal sample complexity and sublinear communication in the high-privacy regime ($\varepsilon < 1$)?
  • RQ2Is it possible to reduce the computational complexity of locally private estimation from $O(nk)$ to nearly linear time while preserving accuracy?
  • RQ3How does the use of Hadamard matrices improve communication efficiency and estimation accuracy in local differential privacy?
  • RQ4What is the performance gain of HR over existing mechanisms like Randomized Response, RAPPOR, and subset selection in terms of speed and communication?
  • RQ5Can the Fast Walsh-Hadamard Transform be effectively used to accelerate decoding in large-domain discrete distribution estimation?

Key findings

  • For $k = 10,000$, the proposed Hadamard Response algorithm runs approximately 100 times faster than the subset-selection (SS) mechanism and RAPPOR.
  • The communication complexity of HR is reduced to $\log k + 2$ bits per user, significantly lower than prior linear-in-$k$ approaches.
  • HR achieves order-optimal sample complexity across all privacy regimes, including the high-privacy regime ($\varepsilon < 1$).
  • The use of the Fast Walsh-Hadamard Transform enables nearly linear $O(nk)$ running time, improving upon the high constants in existing $O(nk)$ algorithms.
  • Theoretical and experimental results confirm that HR maintains high estimation accuracy with minimal communication and computational overhead.
  • Hadamard matrices' large Hamming distance between rows contributes directly to improved sample efficiency and robustness in privatized estimation.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.