Skip to main content
QUICK REVIEW

[Paper Review] Worst-Case Background Knowledge for Privacy-Preserving Data Publishing

David Martín, Daniel Kifer|arXiv (Cornell University)|May 19, 2007
Privacy-Preserving Technologies in Data29 references18 citations
TL;DR

This paper introduces a formal framework for worst-case background knowledge in privacy-preserving data publishing, modeling attacker knowledge using basic implications in a logical language. It proposes a polynomial-time algorithm to compute maximum disclosure under up to k such pieces of knowledge and provides a method to sanitize data to ensure worst-case disclosure remains below a threshold, offering stronger privacy guarantees than k-anonymity or ℓ-diversity.

ABSTRACT

Recent work has shown the necessity of considering an attacker's background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what background knowledge the attacker possesses. Thus, it is important to consider the worst-case. In this paper, we initiate a formal study of worst-case background knowledge. We propose a language that can express any background knowledge about the data. We provide a polynomial time algorithm to measure the amount of disclosure of sensitive information in the worst case, given that the attacker has at most a specified number of pieces of information in this language. We also provide a method to efficiently sanitize the data so that the amount of disclosure in the worst case is less than a specified threshold.

Motivation & Objective

  • To formally model worst-case background knowledge an attacker might possess when re-identifying individuals in published data.
  • To quantify the maximum disclosure of sensitive information under bounded attacker knowledge, measured in terms of k basic implications.
  • To develop an efficient algorithm that computes worst-case disclosure without prior knowledge of the attacker’s specific background knowledge.
  • To provide a sanitization method that ensures disclosure remains below a user-specified threshold, even under the worst-case attacker knowledge.
  • To extend beyond k-anonymity and ℓ-diversity by accounting for richer classes of background knowledge through a logical language of basic implications.

Proposed method

  • The paper defines a logical language of basic implications to express any possible background knowledge about the data, such as functional dependencies or attribute-value constraints.
  • It introduces the concept of (c,k)-safety, where c is a disclosure threshold and k is the maximum number of basic implications an attacker may possess.
  • A polynomial-time algorithm is developed to compute the worst-case disclosure over all possible sets of k basic implications, enabling efficient privacy analysis.
  • The framework integrates with existing lattice-search algorithms to find a bucketization (data anonymization) that is robust against all possible k-piece background knowledge sets.
  • The approach uses bucketization—grouping tuples and permuting sensitive values—while generalizing non-sensitive attributes to prevent re-identification.
  • The method ensures that even in the worst-case scenario, the probability of disclosing a sensitive value exceeds the threshold c only if the attacker has more than k relevant pieces of background knowledge.

Experimental results

Research questions

  • RQ1What is the maximum amount of disclosure of sensitive information that can occur when an attacker possesses up to k pieces of background knowledge?
  • RQ2How can we efficiently compute this worst-case disclosure without knowing the attacker’s exact knowledge?
  • RQ3Can we sanitize a dataset so that disclosure remains below a specified threshold, even under the worst-case background knowledge?
  • RQ4How does this approach compare to existing privacy models like k-anonymity and ℓ-diversity in terms of robustness to background knowledge?
  • RQ5What is the impact of the choice of logical language for expressing background knowledge on the conservativeness of the resulting sanitization?

Key findings

  • The paper presents a polynomial-time algorithm to compute the worst-case disclosure of sensitive information when an attacker holds up to k basic implications, enabling efficient privacy analysis.
  • The framework ensures that data sanitization can be performed to achieve (c,k)-safety, meaning disclosure never exceeds threshold c under any k-piece background knowledge set.
  • The proposed method outperforms ℓ-diversity in terms of privacy robustness, as it guards against a broader class of background knowledge, including complex logical combinations.
  • The approach is computationally efficient and integrates with existing lattice-based search algorithms for finding optimal bucketizations.
  • Empirical results show that ℓ-diversity has similar maximum disclosure to the proposed (c,k)-safety model, suggesting that (c,k)-safety provides a stronger privacy guarantee.
  • The framework is extensible: adding more expressive atoms to the language of basic implications can reduce over-conservatism in sanitization.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.