QUICK REVIEW

[论文解读] Modified Lomax Model: A heavy-tailed distribution for fitting large-scale real-world complex networks

Swarup Chattopadhyay, Tanujit Chakraborty|arXiv (Cornell University)|Nov 28, 2019

Complex Network Analysis Techniques参考文献 61被引用 7

一句话总结

本文提出了一种从Lomax分布的层级族中推导出的修正Lomax（MLM）分布，用于建模现实复杂网络的完整度分布，而无需剔除低度节点。通过引入非线性形状参数，MLM分布比经典幂律分布或其他重尾分布更能有效捕捉对数-对数图中的重尾、非线性行为，从而在50个实际网络中实现了更低的拟合误差。

ABSTRACT

Real-world networks are generally claimed to be scale-free, meaning that the degree distributions follow the classical power-law, at least asymptotically. Yet, closer observation shows that the classical power-law distribution is often inadequate to meet the data characteristics due to the existence of a clearly identifiable non-linearity in the entire degree distribution in the log-log scale. The present paper proposes a new variant of the popular heavy-tailed Lomax distribution which we named as the Modified Lomax (MLM) distribution that can efficiently capture the crucial aspect of heavy-tailed behavior of the entire degree distribution of real-world complex networks. The proposed MLM model, derived from a hierarchical family of Lomax distributions, can efficiently fit the entire degree distribution of real-world networks without removing lower degree nodes as opposed to the classical power-law based fitting. The MLM distribution belongs to the maximum domain of attraction of the Frechet distribution and is right tail equivalent to Pareto distribution. Various statistical properties including characteristics of the maximum likelihood estimates and asymptotic distributions have also been derived for the proposed MLM model. Finally, the effectiveness of the proposed MLM model is demonstrated through rigorous experiments over fifty real-world complex networks from diverse applied domains.

研究动机与目标

为解决经典幂律分布因对数-对数图中非线性性而导致在拟合现实复杂网络完整度分布时的不足。
开发一种灵活的重尾分布，以捕捉节点度的完整范围，而无需排除低度节点。
提出一种具有非线性形状参数的修正Lomax分布，以提升复杂网络度分布建模的准确性。
在多样化的真实网络中，展示MLM模型在拟合性能上优于幂律、Lomax、对数正态及其他重尾分布。

提出的方法

MLM分布从Lomax分布的层级族中推导得出，其形状参数表示为数据的非线性函数。
理论上证明该模型属于Fréchet分布的极值域吸引域，并与Pareto分布具有右尾等价性。
采用最大似然估计（MLE）进行参数估计，当变异系数（CV）> 1时，参数估计的存在性得到保证。
通过解析方法推导出包括渐近分布和尾部正则变异性在内的统计特性。
使用三种指标评估模型拟合效果：均方根误差（RMSE）、KL散度和平均绝对误差（MAE），并采用基于自举法的卡方检验判断统计显著性。
将该模型应用于来自不同领域（包括社交、生物、引文和网络）的50个真实网络。

实验结果

研究问题

RQ1与经典幂律拟合相比，具有非线性形状参数的修正Lomax分布是否能更有效地捕捉现实复杂网络的完整度分布？
RQ2所提出的MLM模型在多种网络类型中是否在拟合精度上优于其他重尾分布（如Lomax、对数正态分布和带截断的幂律分布）？
RQ3MLM分布的理论特性是什么，特别是其尾部分布行为、MLE存在性及极值域吸引域？
RQ4MLM模型在捕捉真实网络对数-对数度分布图中观察到的非线性曲率方面表现如何？
RQ5MLM模型是否能提供一种统计显著且拟合误差更低的替代方案，而无需剔除低度节点？

主要发现

在50个现实网络中，修正Lomax（MLM）分布显著降低了拟合误差，其在RMSE、KL散度和MAE指标上均优于幂律、Lomax、对数正态分布及带截断的幂律分布。
MLM模型通过捕捉对数-对数度分布中的非线性曲率，实现了更优的拟合效果，而无需剔除低度节点，这正是经典幂律拟合的关键局限。
理论分析证实，MLM分布为重尾分布，与Pareto分布右尾等价，并属于Fréchet分布的极值域吸引域。
当数据的变异系数（CV）大于1时，MLM参数的最大似然估计（MLE）存在，确保了其实际可应用性。
基于自举法的卡方检验证实了估计的MLM分布具有统计显著性，验证了其在所有测试网络中的可靠性。
MLM模型通过参数模拟能够更准确地刻画网络演化动力学，为逐步经验分析提供了一种灵活的替代方案。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。