QUICK REVIEW

[論文レビュー] Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data

Fengli Xu, Zhen Tu|arXiv (Cornell University)|Feb 21, 2017

Human Mobility and Location-Based Analysis参考文献 34被引用数 97

ひとこと要約

本論文は、集約されたモビリティデータでも個人の軌跡を明らかにできることを示しており、教師なし攻撃により実データセット（数万〜数十万のユーザーを含む）で73%–91%の精度で軌跡を回復する。

ABSTRACT

Human mobility data has been ubiquitously collected through cellular networks and mobile applications, and publicly released for academic research and commercial purposes for the last decade. Since releasing individual's mobility records usually gives rise to privacy issues, datasets owners tend to only publish aggregated mobility data, such as the number of users covered by a cellular tower at a specific timestamp, which is believed to be sufficient for preserving users' privacy. However, in this paper, we argue and prove that even publishing aggregated mobility data could lead to privacy breach in individuals' trajectories. We develop an attack system that is able to exploit the uniqueness and regularity of human mobility to recover individual's trajectories from the aggregated mobility data without any prior knowledge. By conducting experiments on two real-world datasets collected from both mobile application and cellular network, we reveal that the attack system is able to recover users' trajectories with accuracy about 73%~91% at the scale of tens of thousands to hundreds of thousands users, which indicates severe privacy leakage in such datasets. Through the investigation on aggregated mobility data, our work recognizes a novel privacy problem in publishing statistic data, which appeals for immediate attentions from both academy and industry.

研究の動機と目的

集約されたモビリティデータが集約にもかかわらずプライバシーを漏らすことを実証する。
集約データから個人の軌跡を実データセットを用いて回復する能力を定量化する。
データの粒度と規模がプライバシー漏洩にどう影響するかを調査する。

提案手法

人間のモビリティの規則性と一意性を活用して時間スロット間でレコードを照合する教師なし攻撃フレームワークを提案する。
回復を線形和代入問題としてモデル化し、ハンガリアン法で解く。
モビリティの特徴を用いて夜間・日中・跨日回復の3つのコスト行列を構築する。
実データセットの2つの実測軌跡を用いて、回復精度、回復誤差、そして一意性を評価する。

実験結果

リサーチクエスチョン

RQ1集計されたモビリティデータは事前情報なしで個人の軌跡を明らかにできるか。
RQ2実世界データセットで集約レコードから全軌跡を回復する精度はどの程度か。
RQ3空間/時間の分解能とデータセットサイズはプライバシー漏洩にどのように影響するか。

主な発見

数万〜数十万のユーザーを含むデータセットで73%–91%の精度で回復された軌跡。
回復点のうち誤差が1,000メートル超のものはわずか21%–8%であり、多くの点で回復誤差が小さいことを示す。
回復された軌跡の95%以上が提供された2つの最も頻繁な場所（Top-2）を用いて一意に識別可能である。
空間的・時間的分解能と規模の範囲にわたってプライバシー漏洩が継続し、攻撃の頑健性を示す。
夜間回復は低いモビリティを利用する。日中は速度ベースの予測を用いて次の場所の推定を改善し、跨日の照合は情報利得を用いてサブ軌跡を結びつける。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。