Skip to main content
QUICK REVIEW

[论文解读] Fair k-Center Clustering for Data Summarization

Matthäus Kleindeßner, Pranjal Awasthi|arXiv (Cornell University)|Jan 24, 2019
Data Management and Algorithms参考文献 35被引用 42
一句话总结

本文提出一个线性时间近似框架,用于在基于群体的约束下的公平 k-中心聚类,对两组得到5近似,对m组得到 (3·2^{m-1}−1)-近似,采用递归与交换技巧。

ABSTRACT

In data summarization we want to choose $k$ prototypes in order to summarize a data set. We study a setting where the data set comprises several demographic groups and we are restricted to choose $k_i$ prototypes belonging to group $i$. A common approach to the problem without the fairness constraint is to optimize a centroid-based clustering objective such as $k$-center. A natural extension then is to incorporate the fairness constraint into the clustering problem. Existing algorithms for doing so run in time super-quadratic in the size of the data set, which is in contrast to the standard $k$-center problem being approximable in linear time. In this paper, we resolve this gap by providing a simple approximation algorithm for the $k$-center problem under the fairness constraint with running time linear in the size of the data set and $k$. If the number of demographic groups is small, the approximation guarantee of our algorithm only incurs a constant-factor overhead.

研究动机与目标

  • Motivate data summarization with fairness constraints across demographic groups.
  • Formalize the fair k-center problem with group quotas.
  • Develop a linear-time approximation algorithm that respects group quotas.
  • Provide a recursion-based and exchange-based approach to handle multiple groups.
  • Evaluate theoretical guarantees and empirical performance against baselines.

提出的方法

  • Use a Gonzalez-style greedy 2-approximation as a subroutine for the unfair problem with given C0' (Algorithm 1).
  • For two groups, apply a swapping procedure to adjust centers across groups and then recursively solve the reduced instance (Algorithm 2).
  • Introduce a center-exchange procedure (Algorithm 3) using a directed graph over groups to propagate exchanges along shortest paths.
  • Generalize to arbitrary m groups with a recursive framework (Algorithm 4) that combines Algorithm 3 with a reduced instance on a subset of groups.
  • Provide linear-time running guarantees: O((k+|C0|)|S|) for m=2 and O(((|C0|m+km^2)|S|+km^4)) for general m, under constant-time distance evaluation.

实验结果

研究问题

  • RQ1How can fair k-center clustering be achieved in linear time while satisfying group quotas?
  • RQ2What is the approximation factor achievable for fair k-center with two groups and with more groups?
  • RQ3Can center exchanges across groups be efficiently realized to approach the fairness constraints without quadratic-time penalties?
  • RQ4How does the proposed fair k-center method compare to matroid-based or baseline heuristics in theory and practice?
  • RQ5What are the trade-offs in approximation guarantees as the number of groups grows?

主要发现

  • Algorithm 1 (greedy) yields a 2-approximation for the unfair problem with linear-time complexity.
  • Algorithm 2 attains a 5-approximation for m=2 under the fairness constraint and runs in O((k+|C0|)|S|).
  • Algorithm 3 provides a well-defined center-exchange mechanism with poly-time complexity to obtain a valid G and enable progress toward fairness.
  • Algorithm 4 gives a (3·2^{m-1}−1)-approximation for arbitrary m, with running time O((|C0|m+km^{2})|S|+km^{4}); a lower bound indicates the factor can be exponential in m in the worst case, though empirical results show moderate factors.
  • Comparisons indicate linear-time methods outperform a prior quadratic-time matroid-intersection approach in large data regimes, while yielding competitive objective costs.
  • Experiments demonstrate practical usefulness and a price of fairness relative to unfair baselines.]
  • table_headers: []
  • table_rows: []}>>}```{

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。