Run compressed rank/select for large alphabets

J. Fuentes-Sepulveda, J. Karkkainen, D. Kosolobov, S. Puglisi

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

Given a string of length n that is composed of r runs of letters from the alphabet 0,1,..,σ-1 such that 2 ≤ σ ≤ r, we describe a data structure that, provided r ≤ n/log ω(1) n, stores the string in rlog nσ/r + o(r log nσ/r) bits and supports select and access queries in O(log log(n/r)/loglogn) time and rank queries in O(log log(nσ/r)/log time. We show that r log n(σ-1)/r-O(log n/r) bits are necessary for any such data structure and, thus, our solution is succinct. We also describe a data structure that uses (1 + ϵ)r log nσ/r + O(r) bits, where ϵ > 0 is an arbitrary constant, with the same query times but without the restriction r ≤ n/log ω(1) n. By simple reductions to the colored predecessor problem, we show that the query times are optimal in the important case r ≥ 2logδ n, for an arbitrary constant δ > 0. We implement our solution and compare it with the state of the art, showing that the closest competitors consume 31-46% more space. © 2018 IEEE.
Originalspråkengelska
Titel på gästpublikation2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA
RedaktörerAli Bilgin, Michael W. Marcellin, Joan Serra-Sagrista, James A. Storer
Antal sidor10
FörlagIEEE
Utgivningsdatum2018
Sidor315-324
ISBN (tryckt)978-1-5386-4884-1
ISBN (elektroniskt)978-1-5386-4883-4
DOI
StatusPublicerad - 2018
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangData Compression Conference - Snowbird, Förenta Staterna (USA)
Varaktighet: 27 mar 201830 mar 2018
https://signalprocessingsociety.org/blog/dcc-2019-2019-data-compression-conference

Publikationsserier

NamnDCC
ISSN (elektroniskt)2375-0359

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap

Citera det här

Fuentes-Sepulveda, J., Karkkainen, J., Kosolobov, D., & Puglisi, S. (2018). Run compressed rank/select for large alphabets. I A. Bilgin, M. W. Marcellin, J. Serra-Sagrista, & J. A. Storer (Red.), 2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA (s. 315-324). (DCC). IEEE. https://doi.org/10.1109/DCC.2018.00040
Fuentes-Sepulveda, J. ; Karkkainen, J. ; Kosolobov, D. ; Puglisi, S. / Run compressed rank/select for large alphabets. 2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA. redaktör / Ali Bilgin ; Michael W. Marcellin ; Joan Serra-Sagrista ; James A. Storer. IEEE, 2018. s. 315-324 (DCC).
@inproceedings{d18091dd946146288bc148ead936f3fe,
title = "Run compressed rank/select for large alphabets",
abstract = "Given a string of length n that is composed of r runs of letters from the alphabet 0,1,..,σ-1 such that 2 ≤ σ ≤ r, we describe a data structure that, provided r ≤ n/log ω(1) n, stores the string in rlog nσ/r + o(r log nσ/r) bits and supports select and access queries in O(log log(n/r)/loglogn) time and rank queries in O(log log(nσ/r)/log time. We show that r log n(σ-1)/r-O(log n/r) bits are necessary for any such data structure and, thus, our solution is succinct. We also describe a data structure that uses (1 + ϵ)r log nσ/r + O(r) bits, where ϵ > 0 is an arbitrary constant, with the same query times but without the restriction r ≤ n/log ω(1) n. By simple reductions to the colored predecessor problem, we show that the query times are optimal in the important case r ≥ 2logδ n, for an arbitrary constant δ > 0. We implement our solution and compare it with the state of the art, showing that the closest competitors consume 31-46{\%} more space. {\circledC} 2018 IEEE.",
keywords = "Data structures, Arbitrary constants, Large alphabets, Optimal time, Predecessor problems, rank select, Run length, State of the art, succinct, Data compression, 113 Computer and information sciences",
author = "J. Fuentes-Sepulveda and J. Karkkainen and D. Kosolobov and S. Puglisi",
year = "2018",
doi = "10.1109/DCC.2018.00040",
language = "English",
isbn = "978-1-5386-4884-1",
series = "DCC",
publisher = "IEEE",
pages = "315--324",
editor = "Ali Bilgin and Marcellin, {Michael W.} and Joan Serra-Sagrista and Storer, {James A.}",
booktitle = "2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA",
address = "International",

}

Fuentes-Sepulveda, J, Karkkainen, J, Kosolobov, D & Puglisi, S 2018, Run compressed rank/select for large alphabets. i A Bilgin, MW Marcellin, J Serra-Sagrista & JA Storer (red), 2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA. DCC, IEEE, s. 315-324, Data Compression Conference, Snowbird, Förenta Staterna (USA), 27/03/2018. https://doi.org/10.1109/DCC.2018.00040

Run compressed rank/select for large alphabets. / Fuentes-Sepulveda, J.; Karkkainen, J.; Kosolobov, D.; Puglisi, S.

2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA. red. / Ali Bilgin; Michael W. Marcellin; Joan Serra-Sagrista; James A. Storer. IEEE, 2018. s. 315-324 (DCC).

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

TY - GEN

T1 - Run compressed rank/select for large alphabets

AU - Fuentes-Sepulveda, J.

AU - Karkkainen, J.

AU - Kosolobov, D.

AU - Puglisi, S.

PY - 2018

Y1 - 2018

N2 - Given a string of length n that is composed of r runs of letters from the alphabet 0,1,..,σ-1 such that 2 ≤ σ ≤ r, we describe a data structure that, provided r ≤ n/log ω(1) n, stores the string in rlog nσ/r + o(r log nσ/r) bits and supports select and access queries in O(log log(n/r)/loglogn) time and rank queries in O(log log(nσ/r)/log time. We show that r log n(σ-1)/r-O(log n/r) bits are necessary for any such data structure and, thus, our solution is succinct. We also describe a data structure that uses (1 + ϵ)r log nσ/r + O(r) bits, where ϵ > 0 is an arbitrary constant, with the same query times but without the restriction r ≤ n/log ω(1) n. By simple reductions to the colored predecessor problem, we show that the query times are optimal in the important case r ≥ 2logδ n, for an arbitrary constant δ > 0. We implement our solution and compare it with the state of the art, showing that the closest competitors consume 31-46% more space. © 2018 IEEE.

AB - Given a string of length n that is composed of r runs of letters from the alphabet 0,1,..,σ-1 such that 2 ≤ σ ≤ r, we describe a data structure that, provided r ≤ n/log ω(1) n, stores the string in rlog nσ/r + o(r log nσ/r) bits and supports select and access queries in O(log log(n/r)/loglogn) time and rank queries in O(log log(nσ/r)/log time. We show that r log n(σ-1)/r-O(log n/r) bits are necessary for any such data structure and, thus, our solution is succinct. We also describe a data structure that uses (1 + ϵ)r log nσ/r + O(r) bits, where ϵ > 0 is an arbitrary constant, with the same query times but without the restriction r ≤ n/log ω(1) n. By simple reductions to the colored predecessor problem, we show that the query times are optimal in the important case r ≥ 2logδ n, for an arbitrary constant δ > 0. We implement our solution and compare it with the state of the art, showing that the closest competitors consume 31-46% more space. © 2018 IEEE.

KW - Data structures, Arbitrary constants

KW - Large alphabets

KW - Optimal time

KW - Predecessor problems

KW - rank select

KW - Run length

KW - State of the art

KW - succinct, Data compression

KW - 113 Computer and information sciences

U2 - 10.1109/DCC.2018.00040

DO - 10.1109/DCC.2018.00040

M3 - Conference contribution

SN - 978-1-5386-4884-1

T3 - DCC

SP - 315

EP - 324

BT - 2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA

A2 - Bilgin, Ali

A2 - Marcellin, Michael W.

A2 - Serra-Sagrista, Joan

A2 - Storer, James A.

PB - IEEE

ER -

Fuentes-Sepulveda J, Karkkainen J, Kosolobov D, Puglisi S. Run compressed rank/select for large alphabets. I Bilgin A, Marcellin MW, Serra-Sagrista J, Storer JA, redaktörer, 2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA. IEEE. 2018. s. 315-324. (DCC). https://doi.org/10.1109/DCC.2018.00040