Small Searchable k-Spectra via Subset Rank Queries on the Spectral Burrows-Wheeler Transform

Jarno N Alankot, Simon John Puglisit, Jaakko Matias Vuohtoniemi

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

The κ-spectrum of a string is the set of all distinct substrings of length κ occurring in the string. This is a lossy but computationally convenient representation of the information in the string, with many applications in high-throughput bioinformatics. In this work, we define the notion of the Spectral Burrows-Wheeler Transform (SBWT), which is a particular sequence of subsets of the alphabet of the string that encodes κ-spectrum of the string. We explore different approaches to index the SBWT for membership queries on the underlying κ-spectrum. We identify subset rank queries as the essential subproblem, and propose three space-efficient index structures to solve it. We show, via experiments on a range of genomic data sets, that the simplicity of our new indexes translates into large performance gains in practice over prior art.
Alkuperäiskielienglanti
OtsikkoSIAM Conference on Applied and Computational Discrete Algorithms (ACDA23)
ToimittajatJ Berry, D Shmoys, L Cowen, U Naumann
Sivumäärä12
KustantajaSociety for Industrial and Applied Mathematics
Julkaisupäivä2023
Sivut225-236
ISBN (elektroninen)978-1-61197-771-4
DOI - pysyväislinkit
TilaJulkaistu - 2023
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaSIAM Conference on Applied and Computational Discrete Algorithms - The Sheraton Grand Seattle, Seattle, Washington, Yhdysvallat (USA)
Kesto: 31 toukok. 20232 kesäk. 2023
Konferenssinumero: 2
https://www.siam.org/conferences/cm/conference/acda23

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä