QUICK REVIEW

[论文解读] Identifying the number of clusters for K-Means: A hypersphere density based approach

Sukavanan Nanjundan, Shreeviknesh Sankaran|arXiv (Cornell University)|Dec 2, 2019

Data Mining and Machine Learning Applications参考文献 4被引用 41

一句话总结

本文提出一种基于超球密度的方法，通过在不同簇数量下评估簇质心密度并选择肘部点来确定 K-Means 的簇数。

ABSTRACT

Application of K-Means algorithm is restricted by the fact that the number of clusters should be known beforehand. Previously suggested methods to solve this problem are either ad hoc or require parametric assumptions and complicated calculations. The proposed method aims to solve this conundrum by considering cluster hypersphere density as the factor to determine the number of clusters in the given dataset. The density is calculated by assuming a hypersphere around the cluster centroid for n-different number of clusters. The calculated values are plotted against their corresponding number of clusters and then the optimum number of clusters is obtained after assaying the elbow region of the graph. The method is simple, easy to comprehend, and provides robust and reliable results.

研究动机与目标

说明在 K-Means 中簇的数量往往未知且难以确定的挑战。
提出一种简单、易解释的基于密度的准则，使用围绕簇质心的超球来估计簇的数量。
提供一种鲁棒的方法，避免强参数假设和复杂计算。

提出的方法

对于给定数据集，在不同簇数量（n 从 1 到所选的最大值）下，在每个簇质心周围构建一个超球。
估计以质心为中心的每个超球的密度值，以反映簇的紧凑性。
将计算得到的密度与相应的簇数作图，并识别肘部区域以选择最优簇数。

实验结果

研究问题

RQ1超球周围的质心密度是否能够有效指示 K-Means 的合适簇数？
RQ2质心密度图中的肘部是否在不同数据集上可靠对应最优聚类解？
RQ3该方法是否实现简单且鲁棒，不依赖强参数假设？

主要发现

所提出的超球密度方法提供了一个可解释的准则，用于确定簇的数量。
密度对簇数图中的肘部区域作为最优簇数的指示。
该方法被描述为易于理解，且能够提供鲁棒且可靠的结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。