Abstract

Given a string of length n that is composed of r runs of letters from the alphabet 0,1,..,σ-1 such that 2 ≤ σ ≤ r, we describe a data structure that, provided r ≤ n/log ω(1) n, stores the string in rlog nσ/r + o(r log nσ/r) bits and supports select and access queries in O(log log(n/r)/loglogn) time and rank queries in O(log log(nσ/r)/log time. We show that r log n(σ-1)/r-O(log n/r) bits are necessary for any such data structure and, thus, our solution is succinct. We also describe a data structure that uses (1 + ϵ)r log nσ/r + O(r) bits, where ϵ > 0 is an arbitrary constant, with the same query times but without the restriction r ≤ n/log ω(1) n. By simple reductions to the colored predecessor problem, we show that the query times are optimal in the important case r ≥ 2logδ n, for an arbitrary constant δ > 0. We implement our solution and compare it with the state of the art, showing that the closest competitors consume 31-46% more space. © 2018 IEEE.
Original languageEnglish
Title of host publication2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA
EditorsAli Bilgin, Michael W. Marcellin, Joan Serra-Sagrista, James A. Storer
Number of pages10
PublisherIEEE
Publication date2018
Pages315-324
ISBN (Print)978-1-5386-4884-1
ISBN (Electronic)978-1-5386-4883-4
DOIs
Publication statusPublished - 2018
MoE publication typeA4 Article in conference proceedings
EventData Compression Conference - Snowbird, United States
Duration: 27 Mar 201830 Mar 2018
https://signalprocessingsociety.org/blog/dcc-2019-2019-data-compression-conference

Publication series

NameDCC
ISSN (Electronic)2375-0359

Fields of Science

  • Data structures, Arbitrary constants
  • Large alphabets
  • Optimal time
  • Predecessor problems
  • rank select
  • Run length
  • State of the art
  • succinct, Data compression
  • 113 Computer and information sciences

Cite this

Fuentes-Sepulveda, J., Karkkainen, J., Kosolobov, D., & Puglisi, S. (2018). Run compressed rank/select for large alphabets. In A. Bilgin, M. W. Marcellin, J. Serra-Sagrista, & J. A. Storer (Eds.), 2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA (pp. 315-324). (DCC). IEEE. https://doi.org/10.1109/DCC.2018.00040
Fuentes-Sepulveda, J. ; Karkkainen, J. ; Kosolobov, D. ; Puglisi, S. / Run compressed rank/select for large alphabets. 2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA. editor / Ali Bilgin ; Michael W. Marcellin ; Joan Serra-Sagrista ; James A. Storer. IEEE, 2018. pp. 315-324 (DCC).
@inproceedings{d18091dd946146288bc148ead936f3fe,
title = "Run compressed rank/select for large alphabets",
abstract = "Given a string of length n that is composed of r runs of letters from the alphabet 0,1,..,σ-1 such that 2 ≤ σ ≤ r, we describe a data structure that, provided r ≤ n/log ω(1) n, stores the string in rlog nσ/r + o(r log nσ/r) bits and supports select and access queries in O(log log(n/r)/loglogn) time and rank queries in O(log log(nσ/r)/log time. We show that r log n(σ-1)/r-O(log n/r) bits are necessary for any such data structure and, thus, our solution is succinct. We also describe a data structure that uses (1 + ϵ)r log nσ/r + O(r) bits, where ϵ > 0 is an arbitrary constant, with the same query times but without the restriction r ≤ n/log ω(1) n. By simple reductions to the colored predecessor problem, we show that the query times are optimal in the important case r ≥ 2logδ n, for an arbitrary constant δ > 0. We implement our solution and compare it with the state of the art, showing that the closest competitors consume 31-46{\%} more space. {\circledC} 2018 IEEE.",
keywords = "Data structures, Arbitrary constants, Large alphabets, Optimal time, Predecessor problems, rank select, Run length, State of the art, succinct, Data compression, 113 Computer and information sciences",
author = "J. Fuentes-Sepulveda and J. Karkkainen and D. Kosolobov and S. Puglisi",
year = "2018",
doi = "10.1109/DCC.2018.00040",
language = "English",
isbn = "978-1-5386-4884-1",
series = "DCC",
publisher = "IEEE",
pages = "315--324",
editor = "Ali Bilgin and Marcellin, {Michael W.} and Joan Serra-Sagrista and Storer, {James A.}",
booktitle = "2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA",
address = "International",

}

Fuentes-Sepulveda, J, Karkkainen, J, Kosolobov, D & Puglisi, S 2018, Run compressed rank/select for large alphabets. in A Bilgin, MW Marcellin, J Serra-Sagrista & JA Storer (eds), 2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA. DCC, IEEE, pp. 315-324, Data Compression Conference, Snowbird, United States, 27/03/2018. https://doi.org/10.1109/DCC.2018.00040

Run compressed rank/select for large alphabets. / Fuentes-Sepulveda, J.; Karkkainen, J.; Kosolobov, D.; Puglisi, S.

2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA. ed. / Ali Bilgin; Michael W. Marcellin; Joan Serra-Sagrista; James A. Storer. IEEE, 2018. p. 315-324 (DCC).

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Run compressed rank/select for large alphabets

AU - Fuentes-Sepulveda, J.

AU - Karkkainen, J.

AU - Kosolobov, D.

AU - Puglisi, S.

PY - 2018

Y1 - 2018

N2 - Given a string of length n that is composed of r runs of letters from the alphabet 0,1,..,σ-1 such that 2 ≤ σ ≤ r, we describe a data structure that, provided r ≤ n/log ω(1) n, stores the string in rlog nσ/r + o(r log nσ/r) bits and supports select and access queries in O(log log(n/r)/loglogn) time and rank queries in O(log log(nσ/r)/log time. We show that r log n(σ-1)/r-O(log n/r) bits are necessary for any such data structure and, thus, our solution is succinct. We also describe a data structure that uses (1 + ϵ)r log nσ/r + O(r) bits, where ϵ > 0 is an arbitrary constant, with the same query times but without the restriction r ≤ n/log ω(1) n. By simple reductions to the colored predecessor problem, we show that the query times are optimal in the important case r ≥ 2logδ n, for an arbitrary constant δ > 0. We implement our solution and compare it with the state of the art, showing that the closest competitors consume 31-46% more space. © 2018 IEEE.

AB - Given a string of length n that is composed of r runs of letters from the alphabet 0,1,..,σ-1 such that 2 ≤ σ ≤ r, we describe a data structure that, provided r ≤ n/log ω(1) n, stores the string in rlog nσ/r + o(r log nσ/r) bits and supports select and access queries in O(log log(n/r)/loglogn) time and rank queries in O(log log(nσ/r)/log time. We show that r log n(σ-1)/r-O(log n/r) bits are necessary for any such data structure and, thus, our solution is succinct. We also describe a data structure that uses (1 + ϵ)r log nσ/r + O(r) bits, where ϵ > 0 is an arbitrary constant, with the same query times but without the restriction r ≤ n/log ω(1) n. By simple reductions to the colored predecessor problem, we show that the query times are optimal in the important case r ≥ 2logδ n, for an arbitrary constant δ > 0. We implement our solution and compare it with the state of the art, showing that the closest competitors consume 31-46% more space. © 2018 IEEE.

KW - Data structures, Arbitrary constants

KW - Large alphabets

KW - Optimal time

KW - Predecessor problems

KW - rank select

KW - Run length

KW - State of the art

KW - succinct, Data compression

KW - 113 Computer and information sciences

U2 - 10.1109/DCC.2018.00040

DO - 10.1109/DCC.2018.00040

M3 - Conference contribution

SN - 978-1-5386-4884-1

T3 - DCC

SP - 315

EP - 324

BT - 2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA

A2 - Bilgin, Ali

A2 - Marcellin, Michael W.

A2 - Serra-Sagrista, Joan

A2 - Storer, James A.

PB - IEEE

ER -

Fuentes-Sepulveda J, Karkkainen J, Kosolobov D, Puglisi S. Run compressed rank/select for large alphabets. In Bilgin A, Marcellin MW, Serra-Sagrista J, Storer JA, editors, 2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA. IEEE. 2018. p. 315-324. (DCC). https://doi.org/10.1109/DCC.2018.00040