Projekteja vuodessa
Abstrakti
Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set X of n elements, it asks to select a subset S of k << n elements with maximum diversity, as quantified by the dissimilarities among the elements in S. In this paper, we focus on the diversity maximization problem with fairness constraints in the streaming setting. Specifically, we consider the maxmin dispersion diversity objective, which aims to select the subset S that maximizes the minimum distance (dissimilarity) between any pair of distinct elements in S. Assuming that the set X is partitioned into m disjoint groups by some sensitive attribute, e.g., sex or race, ensuring fairness requires that the selected subset S contains k_i elements from each group i in [1,m]. A streaming algorithm should process X sequentially in one pass and return a subset with maximum diversity while guaranteeing the fairness constraint. Although diversity maximization has been extensively studied, the only known algorithms that can work with the maxmin diversity objective and fairness constraints are very inefficient for data streams. Since diversity maximization is NPhard in general, we propose two novel approximation algorithms for fair diversity maximization in data streams, the first of which is (1ε)/4approximate and specific for m=2, where ε in (0,1), and the second of which achieves a (1ε)/(3m+2)approximation for an arbitrary m. Experimental results show that both algorithms provide solutions of comparable quality to the stateoftheart algorithms while running several orders of magnitude faster in the streaming setting.
Alkuperäiskieli  englanti 

Otsikko  2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022) : ICDE 2022 
Sivumäärä  13 
Kustantaja  IEEE 
Julkaisupäivä  2 elok. 2022 
Sivut  4153 
ISBN (painettu)  9781665408844 
ISBN (elektroninen)  9781665408837 
DOI  pysyväislinkit  
Tila  Julkaistu  2 elok. 2022 
OKMjulkaisutyyppi  A4 Artikkeli konferenssijulkaisuussa 
Tapahtuma  IEEE International Conference on Data Engineering  Kuala Lumpur, Malesia Kesto: 9 toukok. 2022 → 12 toukok. 2022 Konferenssinumero: 38 
Julkaisusarja
Nimi  IEEE International Conference on Data Engineering 

Kustantaja  IEEE COMPUTER SOC 
ISSN (painettu)  10844627 
Tieteenalat
 113 Tietojenkäsittely ja informaatiotieteet
Projektit
 1 Päättynyt

MLDB: Model Management Systems: Machine learning meets Database Systems
Mathioudakis, M., Gionis, A., Mahadevan, A., Maniatis, A., Merchant, A. & Pai, S. G.
Suomen Akatemia Projektilaskutus
01/09/2019 → 31/12/2023
Projekti: Suomen Akatemia: Akatemiahanke