Multiple Set Matching with Bloom Matrix and Bloom Vector

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

Abstrakti

Bloom Filter is a space-efficient probabilistic data structure for checking the membership of elements in a set. Given multiple sets, a standard Bloom Filter is not sufficient when looking for the items to which an element or a set of input elements belong. An example case is searching for documents with keywords in a large text corpus, which is essentially a multiple set matching problem where the input is single or multiple keywords, and the result is a set of possible candidate documents. This article solves the multiple set matching problem by proposing two efficient Bloom Multifilters called Bloom Matrix and Bloom Vector, which generalize the standard Bloom Filter. Both structures are space-efficient and answer queries with a set of identifiers for multiple set matching problems. The space efficiency can be optimized according to the distribution of labels among multiple sets: Uniform and Zipf. Bloom Vector efficiently exploits the Zipf distribution of data for further space reduction. Indeed, both structures are much more space-efficient compared with the state-of-the-art, Bloofi. The results also highlight that a Lookup operation on Bloom Matrix is significantly faster than on Bloom Vector and Bloofi.

Alkuperäiskielienglanti
Artikkeli21
LehtiACM Transactions on Knowledge Discovery from Data
Vuosikerta14
Numero2
Sivumäärä21
ISSN1556-4681
DOI - pysyväislinkit
TilaJulkaistu - maaliskuuta 2020
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä, vertaisarvioitu

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä