Abstrakti
Given an alphabet Σ of σ = |Σ| symbols, a degenerate (or indeterminate) string X is a sequence X = X[0],X[1]…, X[n-1] of n subsets of Σ. Since their introduction in the mid 70s, degenerate strings have been widely studied, with applications driven by their being a natural model for sequences in which there is a degree of uncertainty about the precise symbol at a given position, such as those arising in genomics and proteomics. In this paper we introduce a new data structural tool for degenerate strings, called the subset wavelet tree (SubsetWT). A SubsetWT supports two basic operations on degenerate strings: subset-rank(i,c), which returns the number of subsets up to the i-th subset in the degenerate string that contain the symbol c; and subset-select(i,c), which returns the index in the degenerate string of the i-th subset that contains symbol c. These queries are analogs of rank and select queries that have been widely studied for ordinary strings. Via experiments in a real genomics application in which degenerate strings are fundamental, we show that subset wavelet trees are practical data structures, and in particular offer an attractive space-time tradeoff. Along the way we investigate data structures for supporting (normal) rank queries on base-4 and base-3 sequences, which may be of independent interest. Our C++ implementations of the data structures are available at https://github.com/jnalanko/SubsetWT.
Alkuperäiskieli | englanti |
---|---|
Otsikko | 21st International Symposium on Experimental Algorithms (SEA 2023) |
Toimittajat | Loukas Georgiadis |
Sivumäärä | 14 |
Kustantaja | Schloss Dagstuhl - Leibniz-Zentrum für Informatik |
Julkaisupäivä | 19 heinäk. 2023 |
Sivut | 4:1-4:14 |
ISBN (elektroninen) | 978-3-95977-279-2 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 19 heinäk. 2023 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisuussa |
Tapahtuma | International Symposium on Experimental Algorithms - Barcelona, Espanja Kesto: 24 heinäk. 2023 → 26 heinäk. 2023 Konferenssinumero: 21 https://esdeveniments.upc.edu/92268/section/40810/sea-2023.html |
Julkaisusarja
Nimi | Leibniz International Proceedings in Informatics |
---|---|
Vuosikerta | 265 |
ISSN (elektroninen) | 1868-8969 |
Tieteenalat
- 113 Tietojenkäsittely- ja informaatiotieteet