Relations Between Greedy and Bit-Optimal LZ77 Encodings

Dmitry Kosolobov

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

This paper investigates the size in bits of the LZ77 encoding, which is the most popular and efficient variant of the Lempel--Ziv encodings used in data compression. We prove that, for a wide natural class of variable-length encoders for LZ77 phrases, the size of the greedily constructed LZ77 encoding on constant alphabets is within a factor $O(\frac{\log n}{\log\log\log n})$ of the optimal LZ77 encoding, where $n$ is the length of the processed string. We describe a series of examples showing that, surprisingly, this bound is tight, thus improving both the previously known upper and lower bounds. Further, we obtain a more detailed bound $O(\min\{z, \frac{\log n}{\log\log z}\})$, which uses the number $z$ of phrases in the greedy LZ77 encoding as a parameter, and construct a series of examples showing that this bound is tight even for binary alphabet. We then investigate the problem on non-constant alphabets: we show that the known $O(\log n)$ bound is tight even for alphabets of logarithmic size, and provide tight bounds for some other important cases.
Originalspråkengelska
Titel på gästpublikationRelations Between Greedy and Bit-Optimal LZ77 Encodings
RedaktörerRolf Niedermeier, Brigitte Vallée
Antal sidor14
UtgivningsortDagstuhl
FörlagSchloss Dagstuhl - Leibniz-Zentrum für Informatik
Utgivningsdatum2018
Sidor46:1-46:14
Artikelnummer46
ISBN (tryckt)978-3-95977-062-0
DOI
StatusPublicerad - 2018
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangSymposium on Theoretical Aspects of Computer Science - Caen, Frankrike
Varaktighet: 28 feb 20183 mar 2018
Konferensnummer: 35

Publikationsserier

NamnLeibniz International Proceedings in Informatics (LIPIcs)
FörlagSchloss Dagstuhl--Leibniz-Zentrum fuer Informatik
Volym96
ISSN (elektroniskt)1868-8969

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap

Citera det här