QUICK REVIEW

[论文解读] Inferring Networks of Diffusion and Influence

Manuel Gomez-Rodriguez, Jure Leskovec|arXiv (Cornell University)|Jun 1, 2010

Complex Network Analysis Techniques被引用 236

一句话总结

该论文提出 NetInf，一种可扩展的算法，通过子模优化从观测到的感染或采纳时间推断影响和扩散网络，从而获得近似最优的网络结构。实验表明，现实世界中的新闻扩散呈现出核心-外围结构，少数具有影响力的媒体站点连接了特定主题的聚类。

ABSTRACT

Information diffusion and virus propagation are fundamental processes taking place in networks. While it is often possible to directly observe when nodes become infected with a virus or adopt the information, observing individual transmissions (i.e., who infects whom, or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and finds provably near-optimal networks. We demonstrate the effectiveness of our approach by tracing information diffusion in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news for the top 1,000 media sites and blogs tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.

研究动机与目标

在仅观测到感染时间的情况下，推断信息、影响力或病毒传播的潜在网络。
解决在在线媒体等大规模系统中重建未观测传播网络的挑战。
在部分观测条件下，开发一种可扩展且可证明接近最优的网络推断算法。
揭示现实世界扩散网络的结构特性，例如核心-外围组织和影响力聚类。
仅使用时间上的采纳数据，实现对社交和媒体网络中信息流动的大规模分析。

提出的方法

将网络推断问题建模为级联生成模型的最大似然估计。
将扩散建模为未知有向网络上的随机过程，其中每个节点的感染时间取决于其邻居。
将问题重新表述为选择最优的 k 条有向边以最大化似然，该问题被证明是 NP-难的。
利用似然函数的子模性，设计具有性能保证的贪心近似算法。
采用局部更新和懒惰评估策略，高效扩展至包含数百万个节点和级联的大型数据集。
使用启发式基线进行比较，证明 NetInf 在准确性和可扩展性方面表现更优。

实验结果

研究问题

RQ1在仅知感染时间的情况下，信息在在线媒体中传播的潜在网络结构是什么？
RQ2如何从采纳时间的部分观测中推断出最可能的影响网络？
RQ3在现实世界的扩散网络中，哪些全局结构特性（例如核心-外围结构、聚类）会显现？
RQ4NetInf 在从有限数据中重建真实网络方面与启发式基线相比表现如何？
RQ5推断出的网络能否揭示单个媒体站点的角色，例如核心影响者或社区间连接者？

主要发现

NetInf 仅使用少量级联数据，即可从合成数据中准确恢复真实潜在网络。
在包含 1.7 亿篇博客和新闻文章的真实数据集上，NetInf 推断出的扩散网络具有清晰的核心-外围结构。
排名前 1,000 的媒体站点和博客构成一个核心网络，负责将信息扩散至更广泛的博客圈。
具有稳定影响力圈的媒体站点充当枢纽，而综合新闻媒体则作为特定主题聚类之间的连接者。
推断出的网络揭示了由少数有影响力的中心站点连接的独立主题聚类（例如政治、科技、八卦）。
在合成数据和真实数据上，NetInf 显著优于最大权重启发式基线，在准确性和可扩展性方面均表现更优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。