[论文解读] U-MASK: User-adaptive Spatio-Temporal Masking for Personalized Mobile AI Applications
简要:U-MASK 提出了一种面向个性化移动 AI 的用户和任务自适应时空掩蔽框架,使用 U-SCOPE 进行语义化用户表示,并采用基于扩散的骨干网络在短期、长期和冷启动任务中实现基于掩蔽的条件完成。
Personalized mobile artificial intelligence applications are widely deployed, yet they are expected to infer user behavior from sparse and irregular histories under a continuously evolving spatio-temporal context. This setting induces a fundamental tension among three requirements, i.e., immediacy to adapt to recent behavior, stability to resist transient noise, and generalization to support long-horizon prediction and cold-start users. Most existing approaches satisfy at most two of these requirements, resulting in an inherent impossibility triangle in data-scarce, non-stationary personalization. To address this challenge, we model mobile behavior as a partially observed spatio-temporal tensor and unify short-term adaptation, long-horizon forecasting, and cold-start recommendation as a conditional completion problem, where a user- and task-specific mask specifies which coordinates are treated as evidence. We propose U-MASK, a user-adaptive spatio-temporal masking method that allocates evidence budgets based on user reliability and task sensitivity. To enable mask generation under sparse observations, U-MASK learns a compact, task-agnostic user representation from app and location histories via U-SCOPE, which serves as the sole semantic conditioning signal. A shared diffusion transformer then performs mask-guided generative completion while preserving observed evidence, so personalization and task differentiation are governed entirely by the mask and the user representation. Experiments on real-world mobile datasets demonstrate consistent improvements over state-of-the-art methods across short-term prediction, long-horizon forecasting, and cold-start settings, with the largest gains under severe data sparsity. The code and dataset will be available at https://github.com/NICE-HKU/U-MASK.
研究动机与目标
- Motivate solving the impossibility triangle in personalized mobile AI: immediacy, stability, and generalization under sparse histories.
- Formulate mobile user behavior as a conditional completion problem guided by a learnable mask.
- Develop a compact, task-agnostic user representation to condition masking.
- Integrate a shared diffusion-based backbone to perform mask-guided generative completion.
- Demonstrate improvements over state-of-the-art methods on real-world mobile datasets with sparse data.
提出的方法
- Introduce U-MASK, a masking mechanism that defines inference via evidence regions conditioned on user and task.
- Develop U-SCOPE to produce a compact, task-agnostic user embedding from sparse app-location histories.
- Instantiate a conditional diffusion transformer backbone (DiT) that reconstructs missing regions while preserving observed evidence.
- Compute a task- and user-specific mask by balancing observation budgets, temporal emphasis, and spatial affinity using a hierarchical latent representation.
- Train end-to-end with a reconstruction loss and an InfoNCE-based consistency term to align short-term and long-term representations.
- Allow task differentiation and personalization to be governed entirely by the mask and user representation rather than task-specific heads.
实验结果
研究问题
- RQ1How can we unify short-term prediction, long-horizon forecasting, and cold-start recommendation under sparse mobile histories?
- RQ2Can a learnable, user-adaptive masking mechanism define inference regions that adapt to task goals and user behavior?
- RQ3Does a diffusion-based generative backbone effectively complete partially observed spatio-temporal mobile data conditioned on masks and user semantics?
- RQ4What is the impact of the semantic user representation (U-SCOPE) on masking quality and personalization under data sparsity?
主要发现
- U-MASK achieves consistent improvements over state-of-the-art methods across short-term, long-horizon, and cold-start scenarios.
- Performance gains are largest under severe data sparsity and dynamic contexts.
- A diffusion transformer backbone preserves observed evidence while generating missing regions conditioned on the mask and user semantics.
- U-SCOPE provides robust, task-agnostic user representations that enable effective masking in sparse telemetry.
- The approach demonstrates real-world applicability with real mobile datasets and a plan to release code and data.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。