Skip to main content
QUICK REVIEW

[论文解读] LASER: An Efficient Target-Aware Segmented Attention Framework for End-to-End Long Sequence Modeling

Tianhe Lin, Ziwei Xiong|arXiv (Cornell University)|Feb 12, 2026
Advanced Data Storage Technologies被引用 0
一句话总结

LASER 引入一个 compress-then-refine 的长序列注意力框架,结合 SeqVault 基础设施,实现工业推荐系统的端到端超长序列建模,具备生产级效率并提升 CTR 指标。

ABSTRACT

Modeling ultra-long user behavior sequences is pivotal for capturing evolving and lifelong interests in modern recommendation systems. However, deploying such models in real-time industrial environments faces a strict "Latency Wall", constrained by two distinct bottlenecks: the high I/O latency of retrieving massive user histories and the quadratic computational complexity of standard attention mechanisms. To break these bottlenecks, we present LASER, a full-stack optimization framework developed and deployed at Xiaohongshu (RedNote). Our approach tackles the challenges through two complementary innovations: (1) System efficiency: We introduce SeqVault, a unified schema-aware serving infrastructure for long user histories. By implementing a hybrid DRAM-SSD indexing strategy, SeqVault reduces retrieval latency by 50% and CPU usage by 75%, ensuring millisecond-level access to full real-time and life-cycle user histories. (2) Algorithmic efficiency: We propose a Segmented Target Attention (STA) mechanism to address the computational overhead. Motivated by the inherent sparsity of user interests, STA employs a sigmoid-based gating strategy that acts as a silence mechanism to filter out noisy items. Subsequently, a lightweight Global Stacked Target Attention (GSTA) module refines these compressed segments to capture cross-segment dependencies without incurring high computational costs. This design performs effective sequence compression, reducing the complexity of long-sequence modeling while preserving critical signals. Extensive offline evaluations demonstrate that LASER consistently outperforms state-of-the-art baselines. In large-scale online A/B testing serving over 100 million daily active users, LASER achieved a 2.36% lift in ADVV and a 2.08% lift in revenue, demonstrating its scalability and significant commercial impact.

研究动机与目标

  • 推动工业推荐系统中超长用户行为序列的高效端到端建模。
  • 通过提出系统与算法解决方案,解决来自 I/O 与二次方注意力的延迟和计算瓶颈。
  • 在真实世界平台上提供生产就绪的组件,以提升排名表现和商业指标。

提出的方法

  • 引入 SeqVault,利用 DRAM-SSD 混合索引实现对全生命周期用户历史的实时访问。
  • 提出 Segmented Target Attention (STA),通过基于 sigmoid 的门控对长序列进行压缩并降噪。
  • 开发 Global Stacked Target Attention (GSTA),对压缩后的片段进行细化并建模跨片段依赖。
  • 采用多分辨率特征融合,结合全局上下文、显著信号与时效性。
  • 提供面向部署的全局注意力聚合与通信优化(ZSTD 压缩)。

实验结果

研究问题

  • RQ1如何实现对 ultra-long 用户历史在实时 CTR 预测中的访问与利用?
  • RQ2在减少计算量的同时,压缩-再细化的注意力流水线是否能维持性能?
  • RQ3在大规模工业推荐系统中部署 LASER 的生产收益有哪些?
  • RQ4STA、GSTA 与融合组件的消融对离线 AUC 与在线指标有何影响?

主要发现

MethodAUCAUC GainFLOPs
Base0.7802-1.3 × 10^7
DIN0.7814+0.12%3.3 × 10^7
TWIN0.7810+0.08%-
HSTU0.7822+0.20%3.7 × 10^8
Transformer0.7824+0.22%3.6 × 10^8
LASER0.7826+0.24%4.0 × 10^7
  • LASER 在 Xiaohongshu Ads 上实现了最高离线 AUC(0.7826),超越 DIN、HSTU 和 Transformer 等基线。
  • SeqVault 相对于基于 RocksDB 的 LastN 将 CPU 使用降低约 75%,P99 延迟降低 >50%,并显著减少磁盘使用。
  • LASER 的基于 SIGMOID 的分割优于 softmax;在消融实验中去除该分割会使 AUC 降低 0.03 个百分点。
  • 消融实验表明时效性嵌入是关键信号,单一组件移除时 AUC 最大下降(0.09 个百分点)。
  • LASER 的在线 A/B 测试相比生产基线在 ADVV 和收入方面取得提升(在线具体增益在摘要中给出)。
  • 离线 FLOPs 相较于全自注意力基线显著更低,同时保持深层建模能力(约 0.4e8 对比 3.2e8–3.6e8)。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。