Streaming Algorithms for Diversity Maximization with Fairness Constraints

Yanhao Wang, Francesco Fabbri, Michael Mathioudakis

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set X of n elements, it asks to select a subset S of k << n elements with maximum diversity, as quantified by the dissimilarities among the elements in S. In this paper, we focus on the diversity maximization problem with fairness constraints in the streaming setting. Specifically, we consider the max-min dispersion diversity objective, which aims to select the subset S that maximizes the minimum distance (dissimilarity) between any pair of distinct elements in S. Assuming that the set X is partitioned into m disjoint groups by some sensitive attribute, e.g., sex or race, ensuring fairness requires that the selected subset S contains k_i elements from each group i in [1,m]. A streaming algorithm should process X sequentially in one pass and return a subset with maximum diversity while guaranteeing the fairness constraint. Although diversity maximization has been extensively studied, the only known algorithms that can work with the max-min diversity objective and fairness constraints are very inefficient for data streams. Since diversity maximization is NP-hard in general, we propose two novel approximation algorithms for fair diversity maximization in data streams, the first of which is (1-ε)/4-approximate and specific for m=2, where ε in (0,1), and the second of which achieves a (1-ε)/(3m+2)-approximation for an arbitrary m. Experimental results show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms while running several orders of magnitude faster in the streaming setting.
Alkuperäiskielienglanti
Otsikko2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022) : ICDE 2022
Sivumäärä13
KustantajaIEEE
Julkaisupäivä2 elok. 2022
Sivut41-53
ISBN (painettu)978-1-6654-0884-4
ISBN (elektroninen)978-1-6654-0883-7
DOI - pysyväislinkit
TilaJulkaistu - 2 elok. 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaIEEE International Conference on Data Engineering - Kuala Lumpur, Malesia
Kesto: 9 toukok. 202212 toukok. 2022
Konferenssinumero: 38

Julkaisusarja

NimiIEEE International Conference on Data Engineering
KustantajaIEEE COMPUTER SOC
ISSN (painettu)1084-4627

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä