QUICK REVIEW

[论文解读] Exact algorithms and lower bounds for stable instances of euclidean k-means

Zachary Friggstad, Kamyar Khodamoradi|arXiv (Cornell University)|Jan 6, 2019

Data Management and Algorithms被引用 6

一句话总结

本文提出了一种多项式时间算法，用于在固定维欧几里得空间和加倍度量中求解 (1+ϵ)-稳定 k-means 聚类问题，采用多交换局部搜索方法。证明了此类实例可在多项式时间内被精确求解，同时在合理的 PCP 假设下表明，除非 NP=RP，否则在高维空间中不存在 (1+ϵ₀)-稳定 k-means 的 PTAS。

ABSTRACT

We investigate the complexity of solving stable or perturbation-resilient instances of k-means and k-median clustering in fixed dimension Euclidean metrics (or more generally doubling metrics). The notion of stable or perturbation resilient instances was introduced by Bilu and Linial [2010] and Awasthi, Blum, and Sheffet [2012]. In our context, we say a k-means instance is α-stable if there is a unique optimum solution which remains unchanged if distances are (non-uniformly) stretched by a factor of at most α. Stable clustering instances have been studied to explain why heuristics such as Lloyd's algorithm perform well in practice. In this work we show that for any fixed ϵ > 0, (1 + ϵ)-stable instances of k-means in doubling metrics, which include fixed-dimensional Euclidean metrics, can be solved in polynomial time. More precisely, we show a natural multi-swap local-search algorithm in fact finds the optimum solution for (1 + ϵ)-stable instances of k-means and k-median in a polynomial number of iterations.We complement this result by showing that under a plausible PCP hypothesis this is essentially tight: that when the dimension d is part of the input, there is a fixed ϵ0 > 0 such there is not even a PTAS for (1 + ϵ0)-stable k-means in Rd unless NP=RP. To do this, we consider a robust property of CSPs; call an instance stable if there is a unique optimum solution x* and for any other solution x', the number of unsatisfied clauses is proportional to the Hamming distance between x* and x'. Dinur, Goldreich, and Gur have already shown stable QSAT is hard to approximation for some constant Q [16], our hypothesis is simply that stable QSAT with bounded variable occurrence is also hard (there is in fact work in progress to prove this hypothesis). Given this hypothesis, we consider stability-preserving reductions to prove our hardness for stable k-means. Such reductions seem to be more fragile and intricate than standard L-reductions and may be of further use to demonstrate other stable optimization problems are hard to solve.

研究动机与目标

研究在固定维欧几里得空间和加倍度量中求解稳定 k-means 和 k-median 聚类问题的复杂性。
确定 (1+ϵ)-稳定 k-means 实例是否可在多项式时间内求解。
为当维度作为输入的一部分时的稳定 k-means 建立紧致的困难性界限。
开发一种保持稳定性的归约框架，用于证明稳定优化问题的近似困难性。

提出的方法

提出一种多交换局部搜索算法，可在多项式时间内找到 (1+ϵ)-稳定 k-means 和 k-median 实例的最优解。
使用 α-稳定性的概念，即在距离非均匀拉伸至多 α 倍的条件下，最优聚类保持不变。
从稳定量化可满足性（QSAT）出发，应用保持稳定性的归约至 k-means，利用关于有界出现次数稳定 QSAT 难度的假设。
证明对于任意固定的 ϵ > 0，(1+ϵ)-稳定 k-means 在加倍度量中可在多项式时间内求解。
采用基于 PCP 的假设，表明在高维欧几里得空间中，除非 NP=RP，否则 (1+ϵ₀)-稳定 k-means 不存在 PTAS。
提出一种新型归约，可保持稳定性，可能适用于其他稳定优化问题。

实验结果

研究问题

RQ1在固定维欧几里得空间中，(1+ϵ)-稳定 k-means 实例是否可在多项式时间内求解？
RQ2多交换局部搜索算法是否保证能找到 (1+ϵ)-稳定 k-means 实例的最优解？
RQ3当维度 d 作为输入的一部分时，稳定 k-means 的计算复杂性如何？
RQ4能否使用保持稳定性的归约来证明稳定优化问题的近似困难性？
RQ5在合理的复杂性理论假设下，高维空间中 (1+ϵ₀)-稳定 k-means 是否不存在 PTAS？

主要发现

对于任意固定的 ϵ > 0，(1+ϵ)-稳定 k-means 实例在加倍度量（包括固定维欧几里得空间）中，可通过多交换局部搜索在多项式时间内求解。
多交换局部搜索算法可在多项式时间内找到 (1+ϵ)-稳定 k-means 和 k-median 实例的唯一最优解。
在合理的 PCP 假设下，除非 NP=RP，否则在 d 作为输入一部分的 Rd 中，(1+ϵ₀)-稳定 k-means 不存在 PTAS。
困难性结果通过从有界变量出现次数的稳定 QSAT 到 k-means 的保持稳定性归约建立。
所提出的归约框架比标准的 L-归约更复杂，可能可重用于证明其他稳定优化问题的困难性。
本文提供了证据表明，(1+ϵ)-稳定性是固定维中多项式时间可解的充分条件，但在高维中并非如此。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。