Run compressed rank/select for large alphabets

J. Fuentes-Sepulveda, J. Karkkainen, D. Kosolobov, S. Puglisi

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Given a string of length n that is composed of r runs of letters from the alphabet 0,1,..,σ-1 such that 2 ≤ σ ≤ r, we describe a data structure that, provided r ≤ n/log ω(1) n, stores the string in rlog nσ/r + o(r log nσ/r) bits and supports select and access queries in O(log log(n/r)/loglogn) time and rank queries in O(log log(nσ/r)/log time. We show that r log n(σ-1)/r-O(log n/r) bits are necessary for any such data structure and, thus, our solution is succinct. We also describe a data structure that uses (1 + ϵ)r log nσ/r + O(r) bits, where ϵ > 0 is an arbitrary constant, with the same query times but without the restriction r ≤ n/log ω(1) n. By simple reductions to the colored predecessor problem, we show that the query times are optimal in the important case r ≥ 2logδ n, for an arbitrary constant δ > 0. We implement our solution and compare it with the state of the art, showing that the closest competitors consume 31-46% more space. © 2018 IEEE.
Original languageEnglish
Title of host publication2018 Data Compression Conference 27–30 March 2018 Snowbird, Utah, USA
EditorsAli Bilgin, Michael W. Marcellin, Joan Serra-Sagrista, James A. Storer
Number of pages10
PublisherIEEE
Publication date2018
Pages315-324
ISBN (Print)978-1-5386-4884-1
ISBN (Electronic)978-1-5386-4883-4
DOIs
Publication statusPublished - 2018
MoE publication typeA4 Article in conference proceedings
EventData Compression Conference - Snowbird, United States
Duration: 27 Mar 201830 Mar 2018
https://signalprocessingsociety.org/blog/dcc-2019-2019-data-compression-conference

Publication series

NameDCC
ISSN (Electronic)2375-0359

Fields of Science

  • Data structures, Arbitrary constants
  • Large alphabets
  • Optimal time
  • Predecessor problems
  • rank select
  • Run length
  • State of the art
  • succinct, Data compression
  • 113 Computer and information sciences

Cite this