Skip to main content
QUICK REVIEW

[论文解读] Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency

Youze Tang, Xiaokui Xiao|arXiv (Cornell University)|Apr 3, 2014
Complex Network Analysis Techniques参考文献 11被引用 127
一句话总结

该论文提出TIM算法,一种影响力最大化算法,在保持实际效率的同时,实现了接近最优理论时间复杂度$O((k+\ell)(n+m)\log n/\varepsilon^{2})$。该算法在触发模型下(包括IC和LT模型)以高概率($1-n^{-\ell}$)提供$(1-1/e-\varepsilon)$-近似解,并可在普通机器上于一小时内处理十亿条边的图,相比先前方法提速高达四个数量级。

ABSTRACT

Given a social network G and a constant k, the influence maximization problem asks for k nodes in G that (directly and indirectly) influence the largest number of nodes under a pre-defined diffusion model. This problem finds important applications in viral marketing, and has been extensively studied in the literature. Existing algorithms for influence maximization, however, either trade approximation guarantees for practical efficiency, or vice versa. In particular, among the algorithms that achieve constant factor approximations under the prominent independent cascade (IC) model or linear threshold (LT) model, none can handle a million-node graph without incurring prohibitive overheads. This paper presents TIM, an algorithm that aims to bridge the theory and practice in influence maximization. On the theory side, we show that TIM runs in O((k+\ell) (n+m) \log n / ε^2) expected time and returns a (1-1/e-ε)-approximate solution with at least 1 - n^{-\ell} probability. The time complexity of TIM is near-optimal under the IC model, as it is only a \log n factor larger than the Ω(m + n) lower-bound established in previous work (for fixed k, \ell, and ε). Moreover, TIM supports the triggering model, which is a general diffusion model that includes both IC and LT as special cases. On the practice side, TIM incorporates novel heuristics that significantly improve its empirical efficiency without compromising its asymptotic performance. We experimentally evaluate TIM with the largest datasets ever tested in the literature, and show that it outperforms the state-of-the-art solutions (with approximation guarantees) by up to four orders of magnitude in terms of running time. In particular, when k = 50, ε= 0.2, and \ell = 1, TIM requires less than one hour on a commodity machine to process a network with 41.6 million nodes and 1.4 billion edges.

研究动机与目标

  • 弥合影响力最大化中理论近似保证与实际可扩展性之间的差距。
  • 开发一种支持通用触发模型的算法,涵盖IC和LT模型。
  • 在保持大规模网络中高经验效率的同时,实现接近最优的时间复杂度。
  • 实现在百万节点图上具有非平凡近似保证的影响力最大化。
  • 在运行时间和解质量方面均优于现有最先进算法。

提出的方法

  • TIM使用随机反向可达(RR)集合框架,以高效估计影响力传播范围。
  • 它生成$\lambda / KPT^{+}$个随机RR集合,其中$\lambda$与$1/\varepsilon^{2}$成正比,$KPT^{+}$是最优影响力传播范围的下界。
  • 该算法在RR集合上采用贪心选择策略,以识别高影响力节点。
  • 它引入启发式优化,降低常数因子而不影响渐近性能。
  • TIM支持触发模型,这是一种通用扩散模型,IC和LT模型为其特例。
  • 理论分析证明其期望时间复杂度为$O((k+\ell)(n+m)\log n/\varepsilon^{2})$,且在概率至少$1-n^{-\ell}$下实现$(1-1/e-\varepsilon)$-近似。

实验结果

研究问题

  • RQ1是否存在一种影响力最大化算法,能在大规模网络上同时实现接近最优的理论时间复杂度与实际效率?
  • RQ2能否在保持强近似保证的同时,高效支持触发模型?
  • RQ3所提出的算法在最多4160万个节点和14亿条边的图上如何扩展?
  • RQ4TIM与现有具有近似保证的最先进算法之间的经验性能差距如何?
  • RQ5启发式优化是否能在不牺牲理论边界的前提下显著提升实际效率?

主要发现

  • 当$k=50$,$\varepsilon=0.2$,$\ell=1$时,TIM在不到一小时内处理完4160万节点、14亿条边的图。
  • TIM在运行时间上相比具有近似保证的最先进解决方案提速高达四个数量级。
  • 在LiveJournal数据集上,TIM+(优化版本)运行速度比IRIE快20倍以上,比SIMPATH快1000倍以上(当$k=50$时)。
  • 在DBLP和LiveJournal数据集上,TIM+实现的期望影响力传播范围显著高于IRIE,且在LT模型下性能与SIMPATH相当或更优。
  • IC模型下的内存消耗高于LT模型,原因是$KPT^{+}$值较小,但通过自适应RR集合大小控制,内存消耗仍保持可控。
  • 该算法保持了强大的理论保证:以至少$1-n^{-\ell}$的概率实现$(1-1/e-\varepsilon)$-近似,且时间复杂度在理论下界$\log n$因子范围内接近最优。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。