[论文解读] Fair k-Center Clustering for Data Summarization
本文提出一个线性时间近似框架,用于在基于群体的约束下的公平 k-中心聚类,对两组得到5近似,对m组得到 (3·2^{m-1}−1)-近似,采用递归与交换技巧。
In data summarization we want to choose $k$ prototypes in order to summarize a data set. We study a setting where the data set comprises several demographic groups and we are restricted to choose $k_i$ prototypes belonging to group $i$. A common approach to the problem without the fairness constraint is to optimize a centroid-based clustering objective such as $k$-center. A natural extension then is to incorporate the fairness constraint into the clustering problem. Existing algorithms for doing so run in time super-quadratic in the size of the data set, which is in contrast to the standard $k$-center problem being approximable in linear time. In this paper, we resolve this gap by providing a simple approximation algorithm for the $k$-center problem under the fairness constraint with running time linear in the size of the data set and $k$. If the number of demographic groups is small, the approximation guarantee of our algorithm only incurs a constant-factor overhead.
研究动机与目标
- Motivate data summarization with fairness constraints across demographic groups.
- Formalize the fair k-center problem with group quotas.
- Develop a linear-time approximation algorithm that respects group quotas.
- Provide a recursion-based and exchange-based approach to handle multiple groups.
- Evaluate theoretical guarantees and empirical performance against baselines.
提出的方法
- Use a Gonzalez-style greedy 2-approximation as a subroutine for the unfair problem with given C0' (Algorithm 1).
- For two groups, apply a swapping procedure to adjust centers across groups and then recursively solve the reduced instance (Algorithm 2).
- Introduce a center-exchange procedure (Algorithm 3) using a directed graph over groups to propagate exchanges along shortest paths.
- Generalize to arbitrary m groups with a recursive framework (Algorithm 4) that combines Algorithm 3 with a reduced instance on a subset of groups.
- Provide linear-time running guarantees: O((k+|C0|)|S|) for m=2 and O(((|C0|m+km^2)|S|+km^4)) for general m, under constant-time distance evaluation.
实验结果
研究问题
- RQ1How can fair k-center clustering be achieved in linear time while satisfying group quotas?
- RQ2What is the approximation factor achievable for fair k-center with two groups and with more groups?
- RQ3Can center exchanges across groups be efficiently realized to approach the fairness constraints without quadratic-time penalties?
- RQ4How does the proposed fair k-center method compare to matroid-based or baseline heuristics in theory and practice?
- RQ5What are the trade-offs in approximation guarantees as the number of groups grows?
主要发现
- Algorithm 1 (greedy) yields a 2-approximation for the unfair problem with linear-time complexity.
- Algorithm 2 attains a 5-approximation for m=2 under the fairness constraint and runs in O((k+|C0|)|S|).
- Algorithm 3 provides a well-defined center-exchange mechanism with poly-time complexity to obtain a valid G and enable progress toward fairness.
- Algorithm 4 gives a (3·2^{m-1}−1)-approximation for arbitrary m, with running time O((|C0|m+km^{2})|S|+km^{4}); a lower bound indicates the factor can be exponential in m in the worst case, though empirical results show moderate factors.
- Comparisons indicate linear-time methods outperform a prior quadratic-time matroid-intersection approach in large data regimes, while yielding competitive objective costs.
- Experiments demonstrate practical usefulness and a price of fairness relative to unfair baselines.]
- table_headers: []
- table_rows: []}>>}```{
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。