QUICK REVIEW

[논문 리뷰] Exact algorithms and lower bounds for stable instances of euclidean k-means

Zachary Friggstad, Kamyar Khodamoradi|arXiv (Cornell University)|2019. 01. 06.

Data Management and Algorithms인용 수 6

한 줄 요약

이 논문은 다중 스왑 국소 검색 접근법을 사용하여 고정된 차원의 유클리드 공간 및 듀블리잉 거리에서 (1+ϵ)-안정적 인스턴스에 대해 다항 시간 알고리즘을 제시한다. 이러한 인스턴스는 다항 시간 내에 정확히 해결 가능하다는 것을 증명하며, 합리적인 PCP 가설 하에 고차원 공간에서 (1+ϵ₀)-안정적 k-means에 대해 PTAS가 존재하지 않음을 보여준다. 다만 NP=RP일 경우에 한하여 성립한다.

ABSTRACT

We investigate the complexity of solving stable or perturbation-resilient instances of k-means and k-median clustering in fixed dimension Euclidean metrics (or more generally doubling metrics). The notion of stable or perturbation resilient instances was introduced by Bilu and Linial [2010] and Awasthi, Blum, and Sheffet [2012]. In our context, we say a k-means instance is α-stable if there is a unique optimum solution which remains unchanged if distances are (non-uniformly) stretched by a factor of at most α. Stable clustering instances have been studied to explain why heuristics such as Lloyd's algorithm perform well in practice. In this work we show that for any fixed ϵ > 0, (1 + ϵ)-stable instances of k-means in doubling metrics, which include fixed-dimensional Euclidean metrics, can be solved in polynomial time. More precisely, we show a natural multi-swap local-search algorithm in fact finds the optimum solution for (1 + ϵ)-stable instances of k-means and k-median in a polynomial number of iterations.We complement this result by showing that under a plausible PCP hypothesis this is essentially tight: that when the dimension d is part of the input, there is a fixed ϵ0 > 0 such there is not even a PTAS for (1 + ϵ0)-stable k-means in Rd unless NP=RP. To do this, we consider a robust property of CSPs; call an instance stable if there is a unique optimum solution x* and for any other solution x', the number of unsatisfied clauses is proportional to the Hamming distance between x* and x'. Dinur, Goldreich, and Gur have already shown stable QSAT is hard to approximation for some constant Q [16], our hypothesis is simply that stable QSAT with bounded variable occurrence is also hard (there is in fact work in progress to prove this hypothesis). Given this hypothesis, we consider stability-preserving reductions to prove our hardness for stable k-means. Such reductions seem to be more fragile and intricate than standard L-reductions and may be of further use to demonstrate other stable optimization problems are hard to solve.

연구 동기 및 목표

고정된 차원의 유클리드 공간 및 듀블리잉 거리에서 안정적 k-means 및 k-median 클러스터링 인스턴스의 복잡도를 조사하는 것.
(1+ϵ)-안정적 k-means 인스턴스가 다항 시간 내에 해결 가능한지 판단하는 것.
차원이 입력의 일부인 경우 안정적 k-means의 날카운 하드네스 경계를 설정하는 것.
안정성 보존을 유지하는 감소 프레임워크를 개발하여 안정 최적화 문제에서의 근사 난이도를 증명하는 것.

제안 방법

다중 스왑 국소 검색 알고리즘을 제안하여 (1+ϵ)-안정적 k-means 및 k-median 인스턴스를 다항 시간 내에 최적해로 해결한다.
최적 클러스터링이 거리의 비균일한 확장에 의해 최대 α 배 이내로 변하지 않는 조건인 α-안정성의 개념을 사용한다.
안정성 보존 감소를 통해 안정적 양자화 만족 가능성(QSAT)에서 k-means로의 변환을 적용하며, 변수 발생 수가 제한된 안정적 QSAT의 난이도에 대한 가설을 활용한다.
임의의 고정된 ϵ > 0에 대해, 듀블리잉 거리에서의 (1+ϵ)-안정적 k-me안은 다항 시간 내에 해결 가능하다는 것을 증명한다.
PCP 기반 가설을 사용하여, 차원 d가 입력의 일부인 경우, 고차원 유클리드 공간에서 (1+ϵ₀)-안정적 k-means에 대해 PTAS가 존재하지 않음을 보여준다. 다만 NP=RP일 경우에 한하여 성립한다.
안정성을 보존하는 새로운 유형의 감소를 도입하여, 다른 안정 최적화 문제에 적용 가능한 가능성을 제시한다.

실험 결과

연구 질문

RQ1(1+ϵ)-안정적 k-means 인스턴스가 고정된 차원의 유클리드 공간에서 다항 시간 내에 해결 가능한가?
RQ2다중 스왑 국소 검색 알고리즘이 (1+ϵ)-안정적 k-means 인스턴스에 대해 항상 최적해를 찾을 수 있는가?
RQ3차원 d가 입력의 일부인 경우 안정적 k-means의 계산 복잡도는 어떻게 되는가?
RQ4안정성 보존 감소를 사용하여 안정 최적화 문제에서의 근사 난이도를 증명할 수 있는가?
RQ5고차원 공간에서 (1+ϵ₀)-안정적 k-means에 대해 PTAS가 존재하지 않는다는 것은 합리적인 복잡도 이론적 가정 하에 근거가 되는가?

주요 결과

임의의 고정된 ϵ > 0에 대해, 듀블리잉 거리(고정된 차원의 유클리드 공간 포함)에서의 (1+ϵ)-안정적 k-means 인스턴스는 다중 스왑 국소 검색을 통해 다항 시간 내에 해결 가능하다.
다중 스왑 국소 검색 알고리즘은 (1+ϵ)-안정적 k-means 및 k-median 인스턴스에 대해 유일한 최적해를 다항 시간 내에 찾는다.
합리적인 PCP 가설 하에, 차원 d가 입력의 일부인 경우, 고차원 유클리드 공간에서 (1+ϵ₀)-안정적 k-means에 대해 PTAS가 존재하지 않으며, 다만 NP=RP일 경우에 한하여 성립한다.
이 하드네스 결과는 변수 발생 수가 제한된 안정적 QSAT에서 k-means로의 안정성 보존 감소를 통해 확립된다.
제안된 감소 프레임워크는 표준 L-감소보다 더 복잡하며, 다른 안정 최적화 문제에서의 난이도 증명에 재사용 가능한 가능성을 지닌다.
논문은 (1+ϵ)-안정성이 고정된 차원에서는 다항 시간 해결 가능성을 충분히 보장하지만, 고차원에서는 그렇지 않음을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.