Skip to main content
QUICK REVIEW

[Paper Review] Model-based Differentially Private Data Synthesis

Fang Liu|arXiv (Cornell University)|Jun 26, 2016
Privacy-Preserving Technologies in Data47 references21 citations
TL;DR

This paper proposes Model-based Differentially Private Data Synthesis (ModiPS), a Bayesian framework that generates synthetic individual-level data with strong privacy guarantees by integrating differential privacy (DP) into microdata synthesis. It ensures privacy under a defined budget while preserving data utility through multiple synthetic datasets and a variance combination rule, with theoretical consistency for estimators derived from the released data.

ABSTRACT

We propose model-based based differential private data synthesis (modips) in the Bayesian framework for releasing individual-level surrogate data sets for the original with strong privacy guarantee. The modips technique integrates differential privacy (DP) -- a concept discussed largely in the theoretical computer science community -- into microdata synthesis in statistical disclosure limitation. The modips guarantees individual privacy protection at a given privacy budget without making assumptions about data intruder's behaviors and knowledge. The privacy budget can be used as tuning parameters in the trade-off between privacy protection and original information preservation in synthesized surrogate data. The uncertainty from the sanitization and synthetic process in the modips can be accounted for by releasing multiple synthetic data sets and by applying the proposed variance combination rule. We also characterize the conditions for the consistency of estimators based on released synthetic data. The modips method provides a viable alternative to the currently limited choice set of microdata synthesis approaches in statistical disclosure limitation.

Motivation & Objective

  • To address the lack of robust, privacy-preserving microdata synthesis methods in statistical disclosure limitation.
  • To provide a method that guarantees individual privacy without assumptions about data intruders' knowledge or behavior.
  • To enable a tunable trade-off between privacy protection and information preservation via a privacy budget.
  • To account for uncertainty in the sanitization and synthesis process through multiple synthetic datasets.
  • To establish theoretical consistency for estimators derived from the released synthetic data.

Proposed method

  • Integrates differential privacy (DP) into the Bayesian framework for microdata synthesis to ensure individual-level privacy.
  • Applies a privacy budget as a tuning parameter to control the trade-off between privacy and data utility.
  • Generates multiple synthetic datasets to capture uncertainty from the sanitization and synthesis process.
  • Employs a variance combination rule to properly aggregate inference results across multiple synthetic datasets.
  • Uses a Bayesian hierarchical model to represent data uncertainty and support posterior inference.
  • Derives conditions under which estimators based on synthetic data are consistent with those from original data.

Experimental results

Research questions

  • RQ1How can differential privacy be effectively integrated into microdata synthesis to ensure strong individual privacy guarantees?
  • RQ2What is the impact of the privacy budget on the utility of synthesized data in preserving original statistical properties?
  • RQ3How can uncertainty from the DP-synthetic process be properly quantified and combined across multiple synthetic datasets?
  • RQ4Under what conditions are estimators based on synthetic data consistent with those from the original data?
  • RQ5Can the proposed method serve as a viable alternative to existing microdata synthesis techniques in statistical disclosure limitation?

Key findings

  • The ModiPS method provides a strong privacy guarantee by ensuring differential privacy at a specified privacy budget without relying on assumptions about intruder knowledge.
  • The privacy budget enables a flexible trade-off between privacy protection and preservation of original data information.
  • Multiple synthetic datasets effectively capture the uncertainty introduced by the DP mechanism and the synthesis process.
  • The proposed variance combination rule allows for valid statistical inference by properly aggregating results across synthetic datasets.
  • Theoretical conditions are established under which estimators derived from the synthetic data are consistent with those from the original data.
  • ModiPS offers a practical and theoretically grounded alternative to existing microdata synthesis methods in the context of statistical disclosure limitation.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.