QUICK REVIEW

[论文解读] FPT Approximation for Constrained Metric $k$-Median/Means

Dishant Goyal, Ragesh Jaiswal|arXiv (Cornell University)|Jan 1, 2020

Facility Location and Emergency Management参考文献 97被引用 6

一句话总结

本文首次为一类广泛的受限度量k-中位数和k-均值问题（包括容量限制、r-聚集、容错、异常值和不确定变体）提出了固定参数可追踪（FPT）常数因子近似算法。基于Ding和Xu（2015）的统一采样框架，作者在FPT时间内实现了k-中位数的(3+ε)-近似和k-均值的(9+ε)-近似，优于或匹配了先前结果，同时支持高效的流式实现，仅需常数次遍历和对数空间。

ABSTRACT

The Metric $k$-median problem over a metric space $(\mathcal{X}, d)$ is defined as follows: given a set $L \subseteq \mathcal{X}$ of facility locations and a set $C \subseteq \mathcal{X}$ of clients, open a set $F \subseteq L$ of $k$ facilities such that the total service cost, defined as $Φ(F, C) \equiv \sum_{x \in C} \min_{f \in F} d(x, f)$, is minimised. The metric $k$-means problem is defined similarly using squared distances. In many applications there are additional constraints that any solution needs to satisfy. This gives rise to different constrained versions of the problem such as $r$-gather, fault-tolerant, outlier $k$-means/$k$-median problem. Surprisingly, for many of these constrained problems, no constant-approximation algorithm is known. We give FPT algorithms with constant approximation guarantee for a range of constrained $k$-median/means problems. For some of the constrained problems, ours is the first constant factor approximation algorithm whereas for others, we improve or match the approximation guarantee of previous works. We work within the unified framework of Ding and Xu that allows us to simultaneously obtain algorithms for a range of constrained problems. In particular, we obtain a $(3+\varepsilon)$-approximation and $(9+\varepsilon)$-approximation for the constrained versions of the $k$-median and $k$-means problem respectively in FPT time. In many practical settings of the $k$-median/means problem, one is allowed to open a facility at any client location, i.e., $C \subseteq L$. For this special case, our algorithm gives a $(2+\varepsilon)$-approximation and $(4+\varepsilon)$-approximation for the constrained versions of $k$-median and $k$-means problem respectively in FPT time. Since our algorithm is based on simple sampling technique, it can also be converted to a constant-pass log-space streaming algorithm.

研究动机与目标

为原本难以近似的度量k-中位数和k-均值问题的受限版本设计固定参数可追踪（FPT）近似算法。
将多种受限问题（如容量限制、r-聚集、容错、异常值和不确定聚类）统一到一个算法框架下。
在FPT时间内改进或匹配现有受限k-中位数/k-均值问题的近似保证。
将该方法扩展至支持常数次遍历、对数空间的流式算法，以实现实际部署。

提出的方法

采用Ding和Xu（2015）的统一框架，同时处理k-中位数和k-均值的多种受限变体。
使用基于采样的技术，构建一个小型代表性图G′，其客户到中心的分配成本近似于(1±ε)因子。
将受限聚类问题转化为在采样图G′上的最小费用最大流问题，该问题可在FPT时间内求解。
通过一个两遍流式算法构建图G′，空间复杂度为f(k,ε)·log n，其中f(k,ε) = k^O(k) · log^k(1/ε)。
对于异常值k-服务等特定问题，采用贪心策略选择最远点作为异常值，随后进行Voronoi划分。
通过仅维护关键信息（如最远点、流分配）将FPT算法转换为常数次遍历、对数空间的流式算法。

实验结果

研究问题

RQ1我们能否为广泛受限的k-中位数和k-均值问题设计出具有常数近似比的FPT近似算法？
RQ2统一的基于采样的框架是否能够同时为多种受限变体提供近似保证？
RQ3我们能否在保持FPT运行时间的同时，改进或匹配r-聚集和容错k-均值问题的最佳已知近似比？
RQ4是否可能将FPT近似算法转换为具有常数次遍历和对数空间的高效流式算法？

主要发现

本文在FPT时间内实现了受限k-中位数问题的(3+ε)-近似和受限k-均值问题的(9+ε)-近似，优于或匹配了先前结果。
对于客户可作为设施（C ⊆ L）的特殊情况，算法进一步优化为k-中位数的(2+ε)-近似和k-均值的(4+ε)-近似，且在FPT时间内完成。
对于r-聚集问题，FPT近似界优于以往已知结果，标志着该场景下首次实现常数因子近似。
该方法首次为染色、l-多样性及半监督k-中位数/k-均值问题提供了已知的常数因子近似算法。
算法被转化为一个3遍流式算法，空间复杂度为f(k,ε)·log n，运行时间为f(k,ε)·n^O(1)，其中f(k,ε) = k^O(k) · log^k(1/ε)。
对于异常值k-服务问题，实现了两遍流式算法，空间复杂度为O(k)，时间复杂度为O(n)，能正确识别出m个最远点作为异常值，并对剩余点进行最优聚类。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。