QUICK REVIEW

[논문 리뷰] Clustering of graph vertex subset via Krylov subspace model reduction.

Vladimir Druskin, A. Mamonov|arXiv (Cornell University)|2018. 09. 09.

Model Reduction and Neural Networks인용 수 1

한 줄 요약

이 논문은 대상 그래프 정점 부분집합에 대한 효율적인 스펙트럴 클러스터링을 위한 두 가지 Krylov 부분공간 기반 모델 차원 축소 알고리즘을 제안한다. 대상 부분집합에 제한된 그래프 라플라시안의 확산 전이 함수를 근사하는 저차원 모델을 구성함으로써, Krylov 부분공간 차원을 줄이고 고도로 정교한 k-means 근사 기법을 사용한 안정적이고 확장 가능한 클러스터링을 가능하게 하여, 상당히 낮은 계산 비용으로 일관된 클러스터링 결과를 달성한다.

ABSTRACT

Clustering via graph-Laplacian spectral imbedding is ubiquitous in data science and machine learning. However, it becomes less efficient for large data sets due to two factors. First, computing the partial eigendecomposition of the graph-Laplacian typically requires a large Krylov subspace. Second, after the spectral imbedding is complete, the clustering is typically performed with various relaxations of k-means, which may become prone to getting stuck in local minima and scale poorly in terms of computational cost for large data sets. Here we propose two novel algorithms for spectral clustering of a subset of the graph vertices (target subset) based on the theory of model order reduction. They rely on realizations of a reduced order model (ROM) that accurately approximates the diffusion transfer function of the original graph for inputs and outputs restricted to the target subset. While our focus is limited to this subset, our algorithms produce its clustering that is consistent with the overall structure of the graph. Moreover, working with a small target subset reduces greatly the required dimension of Krylov subspace and allows to exploit the approximations of k-means in the regimes when they are most robust and efficient, as verified by the numerical experiments. There are several uses for our algorithms. First, they can be employed on their own to clusterize a representative subset in cases when the full graph clustering is either infeasible or not required. Second, they may be used for quality control. Third, as they drastically reduce the problem size, they enable the application of more powerful approximations of k-means like those based on semi-definite programming (SDP) instead of the conventional Lloyd's algorithm. Finally, they can be used as building blocks of a divide-and-conquer algorithm for the full graph clustering. The latter will be reported in a separate article.

연구 동기 및 목표

대규모 그래프에서 Krylov 부분공간의 고차원 요구 사항으로 인한 스펙트럴 클러스터링의 계산 비효율성 문제를 해결하기 위해.
전체 그래프가 아닌 대상 정점 부분집합에만 초점을 맞춤으로써 클러스터링의 계산 부담을 줄이기 위해.
문제 크기를 극적으로 줄임으로써 기존에 대규모 그래프에서는 계산이 불가능한, 예를 들어 준정부형 프로그래밍 기반의 보다 정확하고 안정적인 k-means 근사 기법의 사용을 가능하게 하기 위해.
전체 그래프 클러스터링을 위한 분할정복 알고리즘의 구성 요소가 될 수 있도록 품질 제어 기능을 지원하는 프레임워크를 개발하기 위해.

제안 방법

대상 정점 부분집합에 제한된 확산 전이 함수를 정확하게 근사하는 그래프 라플라시안의 저차원 모델(ROM)을 구축하기 위해.
대상 부분집합과 관련된 스펙트럼 성질를 유지하는 저차원 투영을 생성하기 위해 Krylov 부분공간 기법을 사용하여 그래프 라플라시안의 저차원 표현을 도출하기 위해.
대상 정점들을 저차원 공간으로 매핑하기 위해 ROM을 사용하여 스펙트럴 임bedding을 수행하기 위해.
낮아진 차원성 덕분에 기존에 전체 그래프에서는 계산이 불가능한 고급 k-means 근사 기법(예: SDP 기반)을 적용하기 위해.
ROM의 제작 방식을 통해 대상 부분집합과 나머지 그래프 간의 확산 역학을 유지함으로써 전반적인 그래프 구조와의 일관성을 확보하기 위해.
낮은 문제 크기에도 불구하고 정확도를 유지함으로써 클러스터링 품질을 유지를 위해 ROM의 정확도를 활용하기 위해.

실험 결과

연구 질문

RQ1Krylov 부분공간 기법을 통한 모델 차원 축소가 그래프 정점 부분집합의 스펙트럴 클러스터링에 효과적으로 적용될 수 있는가?
RQ2저차원 모델이 대상 부분집합의 정확한 클러스터링에 필요한 스펙트럼 구조를 유지하는가?
RQ3제안된 방법이 클러스터링 무결성을 유지하면서 Krylov 부분공간 차원을 상당히 줄일 수 있는가?
RQ4문제 크기가 줄어들면서 고급 k-means 근사 기법을 어느 정도 적용할 수 있는가?
RQ5대규모 그래프에서 표준 스펙트럴 클러스터링과 비교해 제안된 방법은 효율성과 정확성 측면에서 어떻게 다른가?

주요 결과

제안된 알고리즘은 대상 정점 부분집합에만 초점을 맞춤으로써 Krylov 부분공간 차원을 상당히 줄여 계산 비용을 낮춘다.
저차원 모델은 전반적인 그래프 구조와 일치하는 클러스터링 결과를 도출할 수 있도록 충분한 정확도를 유지한다.
문제 크기가 작아짐에 따라 기존에 대규모 그래프에서는 불가능한, 예를 들어 준정부형 프로그래밍 기반의 더 안정적인 k-means 근사 기법을 사용할 수 있게 된다.
수치 실험을 통해 방법이 효율적이고 정확함을 확인하였으며, 특히 k-means 완화 기법이 가장 효과적인 영역에서 뚜렷한 성능을 발휘한다.
실제 응용, 예를 들어 품질 제어에 적합하며, 전체 그래프 클러스터링을 위한 분할정복 전략에 통합될 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.