QUICK REVIEW

[論文レビュー] Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency

Youze Tang, Xiaokui Xiao|arXiv (Cornell University)|Apr 3, 2014

Complex Network Analysis Techniques参考文献 11被引用数 127

ひとこと要約

この論文は、理論的時間計算量のほぼ最適な $O((k+\ell)(n+m)\log n/\varepsilon^{2})$ を達成しながら、新たなヒューリスティクスによって実用的な効率性を維持する影響拡散最大化アルゴリズムTIMを提案する。トリガリングモデル（ICおよびLTを含む）において、高確率（$1-n^{-\ell}$）で $(1-1/e-\varepsilon)$-近似解を得ることができ、コンmodityマシン上で10億エッジのグラフを1時間未塔で処理し、先行手法に比べ最大4桁の性能向上を達成する。

ABSTRACT

Given a social network G and a constant k, the influence maximization problem asks for k nodes in G that (directly and indirectly) influence the largest number of nodes under a pre-defined diffusion model. This problem finds important applications in viral marketing, and has been extensively studied in the literature. Existing algorithms for influence maximization, however, either trade approximation guarantees for practical efficiency, or vice versa. In particular, among the algorithms that achieve constant factor approximations under the prominent independent cascade (IC) model or linear threshold (LT) model, none can handle a million-node graph without incurring prohibitive overheads. This paper presents TIM, an algorithm that aims to bridge the theory and practice in influence maximization. On the theory side, we show that TIM runs in O((k+\ell) (n+m) \log n / ε^2) expected time and returns a (1-1/e-ε)-approximate solution with at least 1 - n^{-\ell} probability. The time complexity of TIM is near-optimal under the IC model, as it is only a \log n factor larger than the Ω(m + n) lower-bound established in previous work (for fixed k, \ell, and ε). Moreover, TIM supports the triggering model, which is a general diffusion model that includes both IC and LT as special cases. On the practice side, TIM incorporates novel heuristics that significantly improve its empirical efficiency without compromising its asymptotic performance. We experimentally evaluate TIM with the largest datasets ever tested in the literature, and show that it outperforms the state-of-the-art solutions (with approximation guarantees) by up to four orders of magnitude in terms of running time. In particular, when k = 50, ε= 0.2, and \ell = 1, TIM requires less than one hour on a commodity machine to process a network with 41.6 million nodes and 1.4 billion edges.

研究の動機と目的

影響拡散最大化における理論的近似保証と実用的拡張性のギャップを埋めること。
一般化されたトリガリングモデル（ICおよびLTモデルを含む）をサポートするアルゴリズムを開発すること。
大規模ネットワークにおいて、ほぼ最適な時間計算量を維持しつつ、高い実効的効率性を達成すること。
非自明な近似保証を伴って、100万ノードのグラフにおける影響拡散最大化を可能にすること。
実行時間と解の品質の両面で、既存の最先端アルゴリズムを上回ること。

提案手法

影響拡散の効率的推定のために、確率的逆到達可能（RR）集合フレームワークを採用する。
パラメータ $\lambda$ は $1/\varepsilon^{2}$ に比例し、$KPT^{+}$ は最適な影響拡散の下界であるため、$\lambda / KPT^{+}$ 個のランダムRR集合を生成する。
RR集合を用いたグリーディ選択戦略により、高い影響力を持つノードを特定する。
漸近的性能に影響を与えないように、定数要因を削減するためのヒューリスティクス最適化を組み込む。
TIMは、ICおよびLTが特殊ケースとして含まれる一般化された拡散モデルであるトリガリングモデルをサポートする。
理論的解析により、期待時間計算量が $O((k+\ell)(n+m)\log n/\varepsilon^{2})$ であり、確率 $1-n^{-\ell}$ で $(1-1/e-\varepsilon)$-近似が保証されることを示している。

実験結果

リサーチクエスチョン

RQ1影響拡散最大化アルゴリズムは、大規模ネットワークにおいて、ほぼ最適な理論的時間計算量と実用的効率性を両立できるか？
RQ2強力な近似保証を維持しつつ、トリガリングモデルを効率的にサポートできるか？
RQ3本手法は、最大4160万ノードおよび14億エッジを有するグラフにどのようにスケーリングするか？
RQ4TIMと既存の最先端アルゴリズムとの間の実効的性能差はどの程度か？
RQ5ヒューリスティクス最適化は、理論的境界を損なうことなく、実効的効率を著しく向上させられるか？

主な発見

TIMは、$k=50$, $\varepsilon=0.2$, $\ell=1$ の条件下で、4160万ノード、14億エッジのグラフを1時間未塔で処理できる。
TIMは、近似保証を有する最先端手法に比べ、実行時間で最大4桁の性能向上を達成した。
TIM+（最適化版）は、$k=50$ の場合、IRIEに比べ20倍以上、SIMPATHに比べ1000倍以上高速に動作した（LiveJournalデータセットで）。
TIM+は、DBLPおよびLiveJournalデータセットにおいて、IRIEよりも顕著に高い期待影響拡散スコアを達成し、LTモデル下ですべてのデータセットにおいてSIMPATHと同等以上またはそれを上回った。
ICモデル下では $KPT^{+}$ の値が小さくなるため、LTモデルに比べメモリ消費量が高くなるが、適応的RR集合サイズ制御により依然として管理可能である。
理論的保証は強く維持されている：確率 $1-n^{-\ell}$ で $(1-1/e-\varepsilon)$-近似が保証され、理論的下界の $\log n$ 要因以内にほぼ最適な時間計算量を達成している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。