Skip to main content
QUICK REVIEW

[Paper Review] Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data

Fengli Xu, Zhen Tu|arXiv (Cornell University)|Feb 21, 2017
Human Mobility and Location-Based Analysis34 references97 citations
TL;DR

The paper shows that aggregated mobility data can still reveal individual trajectories; an unsupervised attack recovers trajectories with 73%–91% accuracy on real datasets containing tens of thousands to hundreds of thousands of users.

ABSTRACT

Human mobility data has been ubiquitously collected through cellular networks and mobile applications, and publicly released for academic research and commercial purposes for the last decade. Since releasing individual's mobility records usually gives rise to privacy issues, datasets owners tend to only publish aggregated mobility data, such as the number of users covered by a cellular tower at a specific timestamp, which is believed to be sufficient for preserving users' privacy. However, in this paper, we argue and prove that even publishing aggregated mobility data could lead to privacy breach in individuals' trajectories. We develop an attack system that is able to exploit the uniqueness and regularity of human mobility to recover individual's trajectories from the aggregated mobility data without any prior knowledge. By conducting experiments on two real-world datasets collected from both mobile application and cellular network, we reveal that the attack system is able to recover users' trajectories with accuracy about 73%~91% at the scale of tens of thousands to hundreds of thousands users, which indicates severe privacy leakage in such datasets. Through the investigation on aggregated mobility data, our work recognizes a novel privacy problem in publishing statistic data, which appeals for immediate attentions from both academy and industry.

Motivation & Objective

  • Demonstrate that aggregated mobility data leak privacy despite aggregation.
  • Quantify the ability to recover individual trajectories from aggregated data using real-world datasets.
  • Investigate how data granularity and scale affect privacy leakage.

Proposed method

  • Propose an unsupervised attack framework leveraging regularity and uniqueness of human mobility to match records across time slots.
  • Model trajectory recovery as a Linear Sum Assignment Problem solved by the Hungarian algorithm.
  • Construct three cost matrices for nighttime, daytime, and cross-day recovery using mobility characteristics.
  • Use ground-truth trajectories from two real datasets to evaluate recovery accuracy, recovery error, and uniqueness.

Experimental results

Research questions

  • RQ1Can aggregated mobility data reveal individual trajectories without prior information?
  • RQ2What is the accuracy of recovering full trajectories from aggregated records in real-world datasets?
  • RQ3How do spatial/temporal resolution and dataset size influence privacy leakage?

Key findings

  • Recovered trajectories with 73%–91% accuracy across datasets containing tens of thousands to hundreds of thousands of users.
  • Only 21%–8% of recovered points have errors >1,000 meters, indicating low recovery error for most points.
  • Over 95% of recovered trajectories are uniquely distinguishable when provided with the two most frequent locations (Top-2).
  • Privacy leakage persists across a range of spatial and temporal resolutions and scales, showing robustness of the attack.
  • Nighttime recovery leverages low mobility; daytime use of velocity-based prediction improves next-location estimation; cross-day matching uses information gain to link sub-trajectories.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.