QUICK REVIEW

[论文解读] The Computer Science and Physics of Community Detection: Landscapes, Phase Transitions, and Hardness

Cristopher Moore|arXiv (Cornell University)|Feb 1, 2017

Complex Network Analysis Techniques参考文献 15被引用 59

一句话总结

对随机块模型与社区检测中的相变进行综述与分析，揭示信息论与计算阈值，并介绍信念传播及相关的谱方法。

ABSTRACT

Community detection in graphs is the problem of finding groups of vertices which are more densely connected than they are to the rest of the graph. This problem has a long history, but it is undergoing a resurgence of interest due to the need to analyze social and biological networks. While there are many ways to formalize it, one of the most popular is as an inference problem, where there is a "ground truth" community structure built into the graph somehow. The task is then to recover the ground truth knowing only the graph. Recently it was discovered, first heuristically in physics and then rigorously in probability and computer science, that this problem has a phase transition at which it suddenly becomes impossible. Namely, if the graph is too sparse, or the probabilistic process that generates it is too noisy, then no algorithm can find a partition that is correlated with the planted one---or even tell if there are communities, i.e., distinguish the graph from a purely random one with high probability. Above this information-theoretic threshold, there is a second threshold beyond which polynomial-time algorithms are known to succeed; in between, there is a regime in which community detection is possible, but conjectured to require exponential time. For computer scientists, this field offers a wealth of new ideas and open questions, with connections to probability and combinatorics, message-passing algorithms, and random matrix theory. Perhaps more importantly, it provides a window into the cultures of statistical physics and statistical inference, and how those cultures think about distributions of instances, landscapes of solutions, and hardness.

研究动机与目标

给出一个带有植入社区结构的概率模型，以研究何时可以进行恢复。
在稀疏图中探索检测、弱重构和精确重构的相变。
通过后验分布和哈密顿量将推断问题与统计物理联系起来。
讨论诸如信念传播等的算法方法及其理论界限。

提出的方法

将带有 q 个簇的随机块模型及簇内/簇间概率 p_in 与 p_out 形式化。
将后验 P(σ|G) 转换为玻尔兹曼分布，并将其与 Ising/Potts 能量 H(σ) 联系起来。
在常数度数的条件下定义弱重构与精确重构及检测，并确定阈值。
使用腔理论（空腔方法）和信念传播来计算边际并评估相变。
推导 BP 的线性稳定性阈值（Kesten-Stigum 阈值），并将其与非回溯谱方法联系起来。

实验结果

研究问题

RQ1是否能够在稀疏图中检测到植入的社区结构并将其与 Erdős-Rényi 图区分开来？
RQ2在随机块模型中，检测、弱重构和精确重构的精确阈值是什么？
RQ3在信息理论阈值之上，信念传播是否对实现可检测恢复是最优的？
RQ4在稀疏图中，相变如何从后验分布及其边际中产生？
RQ5BP 固定点、稳定性与用于社区检测的谱算法之间的关系是什么？

主要发现

存在信息理论与计算阈值，将检测与重构分为不可能、虽可能但困难、以及可行的区间。
在 Kesten-Stigum 阈值之上可实现弱重构，基于 BP 的方法在若干情形下已被证明有效。
后验分布可以映射为玻尔兹曼分布，将社区检测与 Ising/Potts 模型及相变联系起来。
信念传播给出在稀疏条件下最大化对标签的期望正确率的边际，其固定点的稳定性预测可检测性。
非回溯谱方法与可检测性阈值对齐，在其之上提供高效算法。
在稀疏、局部树状图中，BP 渐近正确，而现实网络中的短环路可能降低其效果但不会严重。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。