QUICK REVIEW

[论文解读] Streaming Stochastic Submodular Maximization with On-Demand User Requests

Honglian Wang, Sijing Tu|arXiv (Cornell University)|Jan 15, 2026

Optimization and Search Problems被引用 0

一句话总结

本文介绍了 S3MOR——一种用于在按需用户访问下最大化预期主题覆盖的流式模型，并提出在不同内存假设下具有可证明竞争比的高效在线算法。同时提供了低内存选项和多猜测 Storm++ 用于处理未知访问次数。

ABSTRACT

We explore a novel problem in streaming submodular maximization, inspired by the dynamics of news-recommendation platforms. We consider a setting where users can visit a news website at any time, and upon each visit, the website must display up to $k$ news items. User interactions are inherently stochastic: each news item presented to the user is consumed with a certain acceptance probability by the user, and each news item covers certain topics. Our goal is to design a streaming algorithm that maximizes the expected total topic coverage. To address this problem, we establish a connection to submodular maximization subject to a matroid constraint. We show that we can effectively adapt previous methods to address our problem when the number of user visits is known in advance or linear-size memory in the stream length is available. However, in more realistic scenarios where only an upper bound on the visits and sublinear memory is available, the algorithms fail to guarantee any bounded performance. To overcome these limitations, we introduce a new online streaming algorithm that achieves a competitive ratio of $1/(8δ)$, where $δ$ controls the approximation quality. Moreover, it requires only a single pass over the stream, and uses memory independent of the stream length. Empirically, our algorithms consistently outperform the baselines.

研究动机与目标

受到新闻推荐与按需用户访问的启发，动机化并形式化 S3MOR 问题。
将问题建模为在分区拟阵约束下的子模最大化。
在不同内存限制下开发具备竞争保障的内存高效在线算法。
提供实证评估，展示实际性能和可扩展性。

提出的方法

将 S3MOR 转化为在分区拟阵约束下的子模最大化，以利用在线流式算法。
提出 LMGreedy，用于全流可用内存的情况下，具有 1/2 的竞争比。
针对未知访问次数但有上界 T' 的情况，提出 Storm，竞争比为 1/(4(T'−T+1))。
引入 Storm++，通过对 T 进行多次猜测并聚合结果来处理未知访问，达到 1/(8δ) 的竞争比，且内存/时间可在 δT' 的尺度上进行权衡。
分析响应时间、时间与空间复杂度，并与基线如 Sieve++、Preemption、LMGreedy 进行比较。

Figure 1 : Schematic view of our algorithm Storm . We first initialize $T^{\prime}$ empty active candidate sets. For each incoming news item in the stream $\mathbf{S}$ , we decide whether it can be added to/swapped into each of the active candidate sets. When a user submits a request, i.e., $\tau_{j

实验结果

研究问题

RQ1如何在流式环境中跨多个按需用户访问最大化预期主题覆盖？
RQ2在访问次数未知或仅有上界时，是否可以在内存约束下实现有界的竞争保障？
RQ3在线 S3MOR 算法在内存使用、时间复杂度与响应性之间的权衡是什么？
RQ4在实际数据集上，与基线相比，所提算法在覆盖率与效率方面的 empirically 表现如何？

主要发现

当可用的流内存为全量内存时，LMGreedy 能达到最佳的1/2竞争比。
当访问次数未知但存在上界 T' 时，Storm 提供 1/(4(T'−T+1)) 的竞争比。
通过对 T 进行多次猜测并聚合结果，Storm++ 以 1/(8δ) 的竞争比实现，内存/时间开销增加了 δT' 的因子。
Storm 与 Storm++ 在响应时间和每用户内存方面优于 LMGreedy，尽管在某些设定下竞争比略弱。
实证结果表明，在多个数据集（Yahoo、RCV1、Amazon 等）上，所提算法在覆盖率和可扩展性方面优于基线（Sieve++、Preemption）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。