QUICK REVIEW

[论文解读] Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks

Hung T. Nguyen, My T. Thai|arXiv (Cornell University)|May 25, 2016

Complex Network Analysis Techniques参考文献 10被引用 57

一句话总结

本文提出了 SSA 和 D-SSA 两种新型采样算法，用于在百亿规模网络中的影响力最大化问题，采用一种动态验证解质量的停顿-凝视策略，检查点按指数级增长。这些方法在保持与最先进方法相同的 (1−1/e−ε) 近似保证的同时，速度最高提升 1200 倍，且理论上最小化了所需的反向可达性（RIS）采样数量。

ABSTRACT

Influence Maximization (IM), that seeks a small set of key users who spread the influence widely into the network, is a core problem in multiple domains. It finds applications in viral marketing, epidemic control, and assessing cascading failures within complex systems. Despite the huge amount of effort, IM in billion-scale networks such as Facebook, Twitter, and World Wide Web has not been satisfactorily solved. Even the state-of-the-art methods such as TIM+ and IMM may take days on those networks. In this paper, we propose SSA and D-SSA, two novel sampling frameworks for IM-based viral marketing problems. SSA and D-SSA are up to 1200 times faster than the SIGMOD'15 best method, IMM, while providing the same $(1-1/e-ε)$ approximation guarantee. Underlying our frameworks is an innovative Stop-and-Stare strategy in which they stop at exponential check points to verify (stare) if there is adequate statistical evidence on the solution quality. Theoretically, we prove that SSA and D-SSA are the first approximation algorithms that use (asymptotically) minimum numbers of samples, meeting strict theoretical thresholds characterized for IM. The absolute superiority of SSA and D-SSA are confirmed through extensive experiments on real network data for IM and another topic-aware viral marketing problem, named TVM. The source code is available at https://github.com/hungnt55/Stop-and-Stare

研究动机与目标

解决现有影响力最大化（IM）算法在 Facebook 和 Twitter 等百亿规模网络中面临的可扩展性限制。
克服先前方法的两个关键缺陷：采样数量无界和理论阈值非最小化。
构建一个统一的 RIS 框架，刻画在 IM 问题中实现 (1−1/e−ε) 近似所需的必要与充分条件。
设计可证明达到理论最小 RIS 采样数量的算法，确保采样效率最优。
将该框架扩展至加权影响传播的定向病毒式营销（TVM）问题，同时保持近似保证。

提出的方法

提出一种广义的 RIS 框架，定义 IM 问题中实现 (1−1/e−ε) 近似的必要条件及 RIS 阈值类别。
定义两类最小阈值：类型-1（每个阈值类别内的最小值）和类型-2（所有类别中的全局最小值）。
提出停顿-凝视算法（SSA），通过在指数级间隔处检查解质量来生成 RIS 采样并验证终止条件。
设计 D-SSA，作为 SSA 的动态变体，可自动调节参数以实现最优性能与采样效率。
将加权 RIS（WRIS）集成至 SSA 和 D-SSA 中，以解决 TVM 问题中针对主题相关用户群体的影响传播。
证明 SSA 和 D-SSA 均可实现理论最小 RIS 采样数量的常数因子近似，而无需显式计算该最小值。

实验结果

研究问题

RQ1我们能否定义一个统一框架，刻画在 IM 问题中实现 (1−1/e−ε) 近似解所需的最小 RIS 采样数量？
RQ2我们能否设计出可证明达到理论最小 RIS 采样数量的采样算法，从而避免过度采样？
RQ3停顿-凝视策略能否有效应用于 IM 问题，以动态验证解质量并减少采样开销？
RQ4所提出的算法能否在保持强理论保证的前提下，扩展至百亿规模网络？
RQ5该框架能否扩展至具有主题感知影响传播的定向病毒式营销（TVM）问题？

主要发现

SSA 和 D-SSA 在 Friendster 网络上（k=500）实现与 IMM 和 TIM+ 相同的 (1−1/e−ε) 近似保证，但速度最高提升 1200 倍。
在 Twitter 网络上（k=1000），D-SSA 的速度约为 CELF++（一种具有保证的快速贪心算法）的 2×10^9 倍。
在 Friendster 的最极端情况（36 亿条边）下，IMM 需要 172 GB 内存，而 D-SSA 和 SSA 分别仅使用 69 GB 和 72 GB。
对于 TVM 问题，D-SSA 和 SSA 在 Twitter 上的运行时间相比 KB-TIM 至少减少两个数量级（最高达 500 倍）。
SSA 和 D-SSA 生成的反向可达性（RR）集合数量显著少于 IMM，甚至在单节点选择情况下亦如此，证实了其采样效率。
D-SSA 中的动态参数选择相比静态 SSA 表现更优，更接近类型-2 最小阈值。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。