Subset Wavelet Trees

Jarno N. Alanko, Elena Biagi, Simon J. Puglisi, Jaakko Matias Vuohtoniemi

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

Given an alphabet Σ of σ = |Σ| symbols, a degenerate (or indeterminate) string X is a sequence X = X[0],X[1]…, X[n-1] of n subsets of Σ. Since their introduction in the mid 70s, degenerate strings have been widely studied, with applications driven by their being a natural model for sequences in which there is a degree of uncertainty about the precise symbol at a given position, such as those arising in genomics and proteomics. In this paper we introduce a new data structural tool for degenerate strings, called the subset wavelet tree (SubsetWT). A SubsetWT supports two basic operations on degenerate strings: subset-rank(i,c), which returns the number of subsets up to the i-th subset in the degenerate string that contain the symbol c; and subset-select(i,c), which returns the index in the degenerate string of the i-th subset that contains symbol c. These queries are analogs of rank and select queries that have been widely studied for ordinary strings. Via experiments in a real genomics application in which degenerate strings are fundamental, we show that subset wavelet trees are practical data structures, and in particular offer an attractive space-time tradeoff. Along the way we investigate data structures for supporting (normal) rank queries on base-4 and base-3 sequences, which may be of independent interest. Our C++ implementations of the data structures are available at https://github.com/jnalanko/SubsetWT.
Alkuperäiskielienglanti
Otsikko21st International Symposium on Experimental Algorithms (SEA 2023)
ToimittajatLoukas Georgiadis
Sivumäärä14
KustantajaSchloss Dagstuhl - Leibniz-Zentrum für Informatik
Julkaisupäivä19 heinäk. 2023
Sivut4:1-4:14
ISBN (elektroninen)978-3-95977-279-2
DOI - pysyväislinkit
TilaJulkaistu - 19 heinäk. 2023
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaInternational Symposium on Experimental Algorithms - Barcelona, Espanja
Kesto: 24 heinäk. 202326 heinäk. 2023
Konferenssinumero: 21
https://esdeveniments.upc.edu/92268/section/40810/sea-2023.html

Julkaisusarja

NimiLeibniz International Proceedings in Informatics
Vuosikerta265
ISSN (elektroninen)1868-8969

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä