QUICK REVIEW

[Paper Review] Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency

Youze Tang, Xiaokui Xiao|arXiv (Cornell University)|Apr 3, 2014

Complex Network Analysis Techniques11 references127 citations

TL;DR

This paper proposes TIM, an influence maximization algorithm that achieves near-optimal theoretical time complexity of $O((k+\\/ell)(n+m)\log n/\varepsilon^{2})$ while maintaining practical efficiency through novel heuristics. It provides a $(1-1/e-\varepsilon)$-approximate solution with high probability ($1-n^{-\\ell}$) under the triggering model, including IC and LT, and processes billion-edge graphs in under an hour on a commodity machine, outperforming prior methods by up to four orders of magnitude.

ABSTRACT

Given a social network G and a constant k, the influence maximization problem asks for k nodes in G that (directly and indirectly) influence the largest number of nodes under a pre-defined diffusion model. This problem finds important applications in viral marketing, and has been extensively studied in the literature. Existing algorithms for influence maximization, however, either trade approximation guarantees for practical efficiency, or vice versa. In particular, among the algorithms that achieve constant factor approximations under the prominent independent cascade (IC) model or linear threshold (LT) model, none can handle a million-node graph without incurring prohibitive overheads. This paper presents TIM, an algorithm that aims to bridge the theory and practice in influence maximization. On the theory side, we show that TIM runs in O((k+\ell) (n+m) \log n / ε^2) expected time and returns a (1-1/e-ε)-approximate solution with at least 1 - n^{-\ell} probability. The time complexity of TIM is near-optimal under the IC model, as it is only a \log n factor larger than the Ω(m + n) lower-bound established in previous work (for fixed k, \ell, and ε). Moreover, TIM supports the triggering model, which is a general diffusion model that includes both IC and LT as special cases. On the practice side, TIM incorporates novel heuristics that significantly improve its empirical efficiency without compromising its asymptotic performance. We experimentally evaluate TIM with the largest datasets ever tested in the literature, and show that it outperforms the state-of-the-art solutions (with approximation guarantees) by up to four orders of magnitude in terms of running time. In particular, when k = 50, ε= 0.2, and \ell = 1, TIM requires less than one hour on a commodity machine to process a network with 41.6 million nodes and 1.4 billion edges.

Motivation & Objective

To bridge the gap between theoretical approximation guarantees and practical scalability in influence maximization.
To develop an algorithm that supports the general triggering model, encompassing IC and LT models.
To achieve near-optimal time complexity while maintaining high empirical efficiency on large-scale networks.
To enable influence maximization on million-node graphs with non-trivial approximation guarantees.
To outperform existing state-of-the-art algorithms in both running time and solution quality.

Proposed method

TIM uses a randomized reverse reachable (RR) set framework to estimate influence spread efficiently.
It generates $\lambda / KPT^{+}$ random RR sets, where $\lambda$ is proportional to $1/\varepsilon^{2}$ and $KPT^{+}$ is a lower bound on the optimal influence spread.
The algorithm employs a greedy selection strategy over RR sets to identify high-influence nodes.
It incorporates heuristic optimizations that reduce constant factors without affecting asymptotic performance.
TIM supports the triggering model, a general diffusion model that includes IC and LT as special cases.
Theoretical analysis proves $O((k+\ell)(n+m)\log n/\varepsilon^{2})$ expected time complexity and $(1-1/e-\varepsilon)$-approximation with probability at least $1-n^{-\ell}$.

Experimental results

Research questions

RQ1Can an influence maximization algorithm achieve both near-optimal theoretical time complexity and practical efficiency on large-scale networks?
RQ2Can the triggering model be efficiently supported while maintaining strong approximation guarantees?
RQ3How does the proposed algorithm scale to graphs with up to 41.6 million nodes and 1.4 billion edges?
RQ4What is the empirical performance gap between TIM and existing state-of-the-art algorithms with approximation guarantees?
RQ5Can heuristic optimizations significantly improve empirical efficiency without sacrificing theoretical bounds?

Key findings

TIM processes a 41.6 million-node, 1.4 billion-edge graph in less than one hour when $k=50$, $\varepsilon=0.2$, and $\ell=1$.
TIM outperforms the state-of-the-art solutions with approximation guarantees by up to four orders of magnitude in running time.
TIM+ (the optimized version) runs over 20 times faster than IRIE and 1000 times faster than SIMPATH on LiveJournal when $k=50$.
TIM+ achieves significantly higher expected influence spreads than IRIE on DBLP and LiveJournal, and matches or exceeds SIMPATH’s performance on all datasets under the LT model.
Memory consumption is higher under the IC model than LT due to smaller $KPT^{+}$ values, but remains manageable due to adaptive RR set size control.
The algorithm maintains strong theoretical guarantees: $(1-1/e-\varepsilon)$-approximation with probability at least $1-n^{-\ell}$, and near-optimal time complexity within a $\log n$ factor of the theoretical lower bound.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.