[论文解读] Understanding Regularized Spectral Clustering via Graph Conductance
本论文将图的传导性与谱聚类联系起来,以解释 Vanilla-SC 在稀疏图上的失败,并展示通过 CoreCut 的正则化可改善平衡性和鲁棒性,同时加速计算。
This paper uses the relationship between graph conductance and spectral clustering to study (i) the failures of spectral clustering and (ii) the benefits of regularization. The explanation is simple. Sparse and stochastic graphs create a lot of small trees that are connected to the core of the graph by only one edge. Graph conductance is sensitive to these noisy `dangling sets'. Spectral clustering inherits this sensitivity. The second part of the paper starts from a previously proposed form of regularized spectral clustering and shows that it is related to the graph conductance on a `regularized graph'. We call the conductance on the regularized graph CoreCut. Based upon previous arguments that relate graph conductance to spectral clustering (e.g. Cheeger inequality), minimizing CoreCut relaxes to regularized spectral clustering. Simple inspection of CoreCut reveals why it is less sensitive to small cuts in the graph. Together, these results show that unbalanced partitions from spectral clustering can be understood as overfitting to noise in the periphery of a sparse and stochastic graph. Regularization fixes this overfitting. In addition to this statistical benefit, these results also demonstrate how regularization can improve the computational speed of spectral clustering. We provide simulations and data examples to illustrate these results.
研究动机与目标
- 解释为什么 Vanilla-SC 在稀疏和随机图上由于外围悬挂集而失败。
- 将 CoreCut 作为与 Regularized-SC 相关的正则化图传导性引入。
- 展示 Regularized-SC 如何缓解过拟合并改善分区平衡。
- 通过实验展示正则化在谱聚类中的计算优势。
提出的方法
- 通过 Cheeger 不等式将图传导性与谱聚类联系起来以推动正则化。
- 定义 g-dangling 集并展示它们在稀疏图中导致小传导性。
- 证明现实世界的稀疏图包含许多 g-dangling 集,导致许多小特征值和慢速收敛。
- 在正则化图 G_tau 上引入 CoreCut 作为正则化传导性并将其与 Regularized-SC 联系起来。
- 推导界限,显示在某些 tau 选择下 CoreCut 何时偏向核心分区而非外围切分。
- 提供经验仿真和真实数据实验,比较 Vanilla-SC 与 Regularized-SC。
实验结果
研究问题
- RQ1图传导性如何与稀疏、随机图上 Vanilla-SC 的失败相关?
- RQ2正则化如何改变传导性格局以有利于核心图分区?
- RQ3CoreCut 是什么,它如何通过图正则化将 Regularized-SC 联系起来?
- RQ4 Regularized-SC 解决方案在实践中是否改善分区平衡并减少过拟合?
- RQ5 使用正则化谱聚类的计算含义是什么?
主要发现
- 稀疏和随机图包含许多 g-dangling 集,产生较小的传导性值和许多较小的特征值,信号的是噪声而非结构。
- CoreCut 正则化将传导性偏向忽略小的外围切分,强调核心图结构,与 Regularized-SC 对齐。
- Regularized-SC 相较于 Vanilla-SC 提供更平衡的分区,而后者往往产生不平衡、由噪声驱动的切分。
- 在所报告的实验中,Regularized-SC 计算获取第二特征值的速度比 Vanilla-SC 快。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。