Projekt per år
Sammanfattning
Coresets are succinct summaries of large datasets such that, for a given problem, the solution obtained from a coreset is provably competitive with the solution obtained from the full dataset. As such, coresetbased data summarization techniques have been successfully applied to various problems, e.g., geometric optimization, clustering, and approximate query processing, for scaling them up to massive data. In this paper, we study coresets for the maxima representation of multidimensional data: Given a set P of points in R^d , where d is a small constant, and an error parameter ε ∈ (0, 1), a subset Q ⊆ P is an εcoreset for the maxima representation of P iff the maximum of Q is an εapproximation of the maximum of P for any vector u ∈ R^d , where the maximum is taken over the inner products between the set of points (P or Q) and u. We define a novel minimum εcoreset problem that asks for an εcoreset of the smallest size for the maxima representation of a point set. For the twodimensional case, we develop an optimal polynomialtime algorithm for the minimum εcoreset problem by transforming it into the shortestcycle problem in a directed graph. Then, we prove that this problem is NPhard in three or higher dimensions and present polynomialtime approximation algorithms in an arbitrary fixed dimension. Finally, we provide extensive experimental results on both real and synthetic datasets to demonstrate the superior performance of our proposed algorithms.
Originalspråk  engelska 

Titel på värdpublikation  Proceedings of the 40th ACM SIGMODSIGACTSIGAI Symposium on Principles of Database Systems (PODS '21) 
Förlag  ACM 
Utgivningsdatum  2021 
Sidor  138–152 
ISBN (tryckt)  9781450383813 
ISBN (elektroniskt)  9781450383813 
DOI  
Status  Publicerad  2021 
MoEpublikationstyp  A4 Artikel i en konferenspublikation 
Evenemang  2021 ACM SIGMOD/PODS International Conference on Management of Data  Virtual Event, Xi'an, Kina Varaktighet: 20 juni 2021 → 25 juni 2021 https://2021.sigmod.org 
Vetenskapsgrenar
 113 Data och informationsvetenskap
Projekt
 1 Aktiv

MLDB: Model Management Systems: Machine learning meets Database Systems
Gionis, A. & Mathioudakis, M.
Suomen Akatemia Projektilaskutus
01/09/2019 → 31/12/2023
Projekt: Suomen Akatemia: : Akatemiahanke