Skip to main content
QUICK REVIEW

[论文解读] Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data

Fengli Xu, Zhen Tu|arXiv (Cornell University)|Feb 21, 2017
Human Mobility and Location-Based Analysis参考文献 34被引用 97
一句话总结

论文表明聚合移动性数据仍可揭示个体轨迹;在包含数万到数十万用户的真实数据集上,一种无监督攻击可以73%–91%的准确率恢复轨迹。

ABSTRACT

Human mobility data has been ubiquitously collected through cellular networks and mobile applications, and publicly released for academic research and commercial purposes for the last decade. Since releasing individual's mobility records usually gives rise to privacy issues, datasets owners tend to only publish aggregated mobility data, such as the number of users covered by a cellular tower at a specific timestamp, which is believed to be sufficient for preserving users' privacy. However, in this paper, we argue and prove that even publishing aggregated mobility data could lead to privacy breach in individuals' trajectories. We develop an attack system that is able to exploit the uniqueness and regularity of human mobility to recover individual's trajectories from the aggregated mobility data without any prior knowledge. By conducting experiments on two real-world datasets collected from both mobile application and cellular network, we reveal that the attack system is able to recover users' trajectories with accuracy about 73%~91% at the scale of tens of thousands to hundreds of thousands users, which indicates severe privacy leakage in such datasets. Through the investigation on aggregated mobility data, our work recognizes a novel privacy problem in publishing statistic data, which appeals for immediate attentions from both academy and industry.

研究动机与目标

  • 证明聚合移动性数据在聚合后仍然泄露隐私。
  • 使用真实世界数据集量化从聚合数据中恢复个体轨迹的能力。
  • 研究数据粒度和规模如何影响隐私泄露。

提出的方法

  • 提出一个无监督攻击框架,利用人类移动性的规律性和唯一性在不同时间段之间匹配记录。
  • 将轨迹恢复建模为通过匈牙利算法求解的线性和赋值问题。
  • 使用移动特征构建夜间、日间和跨日恢复的三个成本矩阵。
  • 使用来自两个真实数据集的真值轨迹来评估恢复准确性、恢复误差和唯一性。

实验结果

研究问题

  • RQ1聚合移动性数据在没有先验信息的情况下能否揭示个体轨迹?
  • RQ2在真实数据集中从聚合记录恢复完整轨迹的准确性是多少?
  • RQ3空间/时间分辨率和数据集规模如何影响隐私泄露?

主要发现

  • 在包含数万到数十万用户的数据集中,恢复的轨迹准确率为73%–91%。
  • 只有21%–8%的恢复点的误差 >1,000米,表明大多数点的恢复误差较小。
  • 当给出两大最常见位置(Top-2)时,超过95%的恢复轨迹可以唯一区分。
  • 隐私泄露在广泛的空间和时间分辨率以及尺度范围内仍然存在,显示出攻击的鲁棒性。
  • 夜间恢复利用低移动性;白天使用基于速度的预测提高下一位置估计;跨日匹配利用信息增益连接子轨迹。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。