Scalable Fair Clustering

ICML

Cite Paper Code

Authors

Arturs Backurs
Piotr Indyk
Krzysztof Onak
Baruch Schieber
Ali Vakilian
Tal Wagner

Published on

06/15/2019

We study the fair variant of the classic k-median problem introduced by (Chierichetti et al., NeurIPS 2017) in which the points are colored, and the goal is to minimize the same average distance objective as in the standard 𝑘-median problem while ensuring that all clusters have an “approximately equal” number of points of each color. (Chierichetti et al., NeurIPS 2017) proposed a two-phase algorithm for fair 𝑘-clustering. In the first step, the pointset is partitioned into subsets called fairlets that satisfy the fairness requirement and approximately preserve the k-median objective. In the second step, fairlets are merged into k clusters by one of the existing k-median algorithms. The running time of this algorithm is dominated by the first step, which takes super-quadratic time. In this paper, we present a practical approximate fairlet decomposition algorithm that runs in nearly linear time.

Please cite our work using the BibTeX below.

@InProceedings{pmlr-v97-backurs19a,
  title = 	 {Scalable Fair Clustering},
  author =       {Backurs, Arturs and Indyk, Piotr and Onak, Krzysztof and Schieber, Baruch and Vakilian, Ali and Wagner, Tal},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {405--413},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/backurs19a/backurs19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/backurs19a.html}
}