Abstrakti
The κ-spectrum of a string is the set of all distinct substrings of length κ occurring in the string. This is a lossy but computationally convenient representation of the information in the string, with many applications in high-throughput bioinformatics. In this work, we define the notion of the Spectral Burrows-Wheeler Transform (SBWT), which is a particular sequence of subsets of the alphabet of the string that encodes κ-spectrum of the string. We explore different approaches to index the SBWT for membership queries on the underlying κ-spectrum. We identify subset rank queries as the essential subproblem, and propose three space-efficient index structures to solve it. We show, via experiments on a range of genomic data sets, that the simplicity of our new indexes translates into large performance gains in practice over prior art.
Alkuperäiskieli | englanti |
---|---|
Otsikko | SIAM Conference on Applied and Computational Discrete Algorithms (ACDA23) |
Toimittajat | J Berry, D Shmoys, L Cowen, U Naumann |
Sivumäärä | 12 |
Kustantaja | Society for Industrial and Applied Mathematics |
Julkaisupäivä | 2023 |
Sivut | 225-236 |
ISBN (elektroninen) | 978-1-61197-771-4 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 2023 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisuussa |
Tapahtuma | SIAM Conference on Applied and Computational Discrete Algorithms - The Sheraton Grand Seattle, Seattle, Washington, Yhdysvallat (USA) Kesto: 31 toukok. 2023 → 2 kesäk. 2023 Konferenssinumero: 2 https://www.siam.org/conferences/cm/conference/acda23 |
Tieteenalat
- 113 Tietojenkäsittely- ja informaatiotieteet