QUICK REVIEW

[论文解读] Change Point Estimation in a Dynamic Stochastic Block Model

Monika Bhattacharjee, Moulinath Banerjee|arXiv (Cornell University)|Dec 7, 2018

Statistical Methods and Inference参考文献 41被引用 28

一句话总结

本文提出了两种在动态随机块模型（DSBM）中估计单个变化点的方法，其中网络社区结构在未知时间发生改变。第一种方法使用最小二乘准则，并在每个时间点完整估计社区结构；第二种方法将变化点检测与社区检测解耦。两种方法在不同的可识别性条件下均实现了稳定的估计，且为变化点估计量建立了理论收敛速率和渐近正态性。

ABSTRACT

We consider the problem of estimating the location of a single change point in a dynamic stochastic block model. We propose two methods of estimating the change point, together with the model parameters. The first employs a least squares criterion function and takes into consideration the full structure of the stochastic block model and is evaluated at each point in time. Hence, as an intermediate step, it requires estimating the community structure based on a clustering algorithm at every time point. The second method comprises of the following two steps: in the first one, a least squares function is used and evaluated at each time point, but ignores the community structures and just considers a random graph generating mechanism exhibiting a change point. Once the change point is identified, in the second step, all network data before and after it are used together with a clustering algorithm to obtain the corresponding community structures and subsequently estimate the generating stochastic block model parameters. A comparison between these two methods is illustrated. Further, for both methods under their respective identifiability and certain additional regularity conditions, we establish rates of convergence and derive the asymptotic distributions of the change point estimators. The results are illustrated on synthetic data.

研究动机与目标

解决由动态随机块模型（DSBM）生成的网络序列中检测单个变化点的问题，其中社区结构在未知时间发生演变。
开发计算高效且统计一致的方法，用于估计变化点、变化前后的社区结构以及SBM参数。
在不同网络稀疏性和边概率变化的条件下，为变化点估计量建立理论保证——收敛速率和渐近分布。
研究聚类算法误分类对估计一致性的影响，特别是在现实网络设置中。
比较两种不同估计策略在计算成本与统计可识别性之间的权衡。

提出的方法

提出一种完整结构方法，即在每个时间点评估最小二乘准则函数，包含完整的随机块模型结构，并要求在每个时间点通过聚类进行社区检测。
引入两步法：首先，使用忽略社区结构的最小二乘函数检测变化点；其次，在变化点前后对数据应用聚类，以估计社区结构和模型参数。
在三种不同情形下推导变化点估计量的收敛速率和渐近正态性：密集网络、稀疏网络中全局边概率变化、稀疏网络中局部边概率变化。
建立一致估计的可识别性条件，包括完整结构方法的误分类率条件和两步法的更弱条件。
使用谱聚类作为底层社区检测算法，并对误分类对变化点估计影响进行理论分析。
分析变化点估计量在三种情形下的渐近分布：(I) 密集网络，(II) 全局边概率变化，(III) 局部边变化且变化前后有有限条边的概率不同。

实验结果

研究问题

RQ1当社区结构在单一未知时间发生演变时，在何种条件下可对DSBM中的变化点实现一致估计？
RQ2在计算成本与统计可识别性方面，完整结构最小二乘法与两步法的性能相比如何？
RQ3在不同网络稀疏性情形下，变化点估计量的收敛速率和渐近分布是什么？
RQ4聚类算法对节点的误分类如何影响变化点和SBM参数估计的一致性？
RQ5两步法是否能在比完整结构方法更弱的可识别性条件下实现一致估计？在哪些现实网络场景中该方法成立？

主要发现

完整结构方法对聚类算法的误分类率有严格要求，该条件在现实场景中可能不成立，但其提供了更强的可识别性以实现一致估计。
两步法在显著更弱的可识别性条件下实现了变化点的一致估计，该条件在实际场景中成立，如社区合并、分裂或节点重分配。
对于两种方法，在各自适用的正则性和可识别性条件下，变化点估计量以 $ O_p(n^{-1}) $ 的速率收敛，且在所有三种情形下均建立了渐近正态性。
在情形II（全局边概率变化）中，变化点估计量的渐近方差与 $ ilde{ ho}^2 $ 成正比，其中 $ ilde{ ho}^2 $ 定义为边概率差异的平方和的归一化形式。
在情形III（局部变化）中，渐近分布依赖于一个有限集合 $ ilde{ ho}_0 $ 的边，其概率在变化前后不同，且条件确保了极限分布的稳定性和收敛性。
理论结果适用于密集和稀疏网络情形，尽管在稀疏情形下的渐近正态性仍需进一步研究，特别是针对自适应推断。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。