QUICK REVIEW

[논문 리뷰] Identifying the number of clusters for K-Means: A hypersphere density based approach

Sukavanan Nanjundan, Shreeviknesh Sankaran|arXiv (Cornell University)|2019. 12. 02.

Data Mining and Machine Learning Applications참고 문헌 4인용 수 41

한 줄 요약

논문은 K-Means의 군집 수를 결정하기 위해 구심점 밀도(centroid densities)로 다양한 군집 수에서의 군집 중심 밀도를 평가하고 엘보 지점을 선택하는 하이퍼스피어 밀도 기반 방법을 제안한다.

ABSTRACT

Application of K-Means algorithm is restricted by the fact that the number of clusters should be known beforehand. Previously suggested methods to solve this problem are either ad hoc or require parametric assumptions and complicated calculations. The proposed method aims to solve this conundrum by considering cluster hypersphere density as the factor to determine the number of clusters in the given dataset. The density is calculated by assuming a hypersphere around the cluster centroid for n-different number of clusters. The calculated values are plotted against their corresponding number of clusters and then the optimum number of clusters is obtained after assaying the elbow region of the graph. The method is simple, easy to comprehend, and provides robust and reliable results.

연구 동기 및 목표

K-Means에서 군집 수가 종종 알려지지 않았고 결정하기 어려운 도전 과제를 동기 부여한다.
군집 중심점을 둘러싼 하이퍼스피어를 이용한 간단하고 해석 가능한 밀도 기반 기준을 제안하여 군집 수를 추정한다.
무거운 모수적 가정과 복잡한 계산을 피하는 견고한 접근법을 제공한다.

제안 방법

주어진 데이터 세트에 대해, 서로 다른 군집 수(n이 1에서 선택한 최대값까지)에 대해 각 군집 중심점 주위에 하이퍼스피어를 구성한다.
군집의 응집도를 반영하기 위해 중심점을 중심으로 한 각 하이퍼스피어의 밀도 값을 추정한다.
계산된 밀도를 해당 군집 수에 대해 플롯하고 엘보 영역을 식별하여 최적의 군집 수를 선택한다.

실험 결과

연구 질문

RQ1중심점 주변의 하이퍼스피어 밀도가 K-Means에 대해 적절한 군집 수를 효과적으로 나타낼 수 있는가?
RQ2중심점 밀도 그래프의 엘보가 데이터 집합 전반에서 최적의 군집 해법과 신뢰할 수 있게 일치하는가?
RQ3이 방법은 구현이 간단하고 강한 모수적 가정 없이 견고한가?

주요 결과

제안된 하이퍼스피어 밀도 접근법은 군집 수를 식별하기 위한 해석 가능한 기준을 제공합니다.
밀도-대-군집 수 그래프의 엘보 영역이 최적의 군집 수의 지표로 작용합니다.
이 방법은 이해하기 쉽고 견고하고 신뢰할 수 있는 결과를 제공할 수 있는 것으로 설명됩니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.