## Sammanfattning

The Lempel-Ziv parsing (LZ77) is a widely popular construction lying at the heart of many compression algorithms. These algorithms usually treat the data as a sequence of bytes, i.e., blocks of fixed length 8. Another common option is to view the data as a sequence of bits. We investigate the following natural question: what is the relationship between the LZ77 parsings of the same data interpreted as a sequence of fixed-length blocks and as a sequence of bits (or other “elementary” letters)? In this paper, we prove that, for any integer b > 1, the number z of phrases in the LZ77 parsing of a string of length n and the number z_{b} of phrases in the LZ77 parsing of the same string in which blocks of length b are interpreted as separate letters (e.g., b = 8 in case of bytes) are related as z_{b} = O(bz log^{n}z ). The bound holds for both “overlapping” and “non-overlapping” versions of LZ77. Further, we establish a tight bound z_{b} = O(bz) for the special case when each phrase in the LZ77 parsing of the string has a “phrase-aligned” earlier occurrence (an occurrence equal to the concatenation of consecutive phrases). The latter is an important particular case of parsing produced, for instance, by grammar-based compression methods.

Originalspråk | engelska |
---|---|

Artikelnummer | 359 |

Tidskrift | Algorithms |

Volym | 14 |

Nummer | 12 |

Antal sidor | 11 |

ISSN | 1999-4893 |

DOI | |

Status | Publicerad - dec. 2021 |

MoE-publikationstyp | A1 Tidskriftsartikel-refererad |

## Vetenskapsgrenar

- 113 Data- och informationsvetenskap