Superstrings with multiplicities

B. Cazaux, E. Rivals

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Kuvaus

A superstring of a set of words P = s1, · · · , sp is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word si comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of si. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems. © 2018 Yoshifumi Sakai; licensed under Creative Commons License CC-BY.
Alkuperäiskielienglanti
OtsikkoAnnual Symposium on Combinatorial Pattern Matching (CPM 2018)
ToimittajatGonzalo Navarro, David Sankoff, Binhai Zhu
Sivumäärä16
JulkaisupaikkaWadern
KustantajaSchloss Dagstuhl Leibniz Center for Informatics
Julkaisupäivä2018
Artikkeli no21
ISBN (elektroninen)978-3-95977-074-3
DOI - pysyväislinkit
TilaJulkaistu - 2018
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaAnnual Symposium on Combinatorial Pattern Matching - Qingdao, Kiina
Kesto: 2 heinäkuuta 20184 heinäkuuta 2018
Konferenssinumero: 29
http://cpm2018.sdu.edu.cn/

Julkaisusarja

NimiLeibniz International Proceedings in Informatics (LIPIcs)
KustantajaSchloss Dagstuhl--Leibniz-Zentrum fuer Informatik
Vuosikerta105
ISSN (elektroninen)1868-8969

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Lainaa tätä

Cazaux, B., & Rivals, E. (2018). Superstrings with multiplicities. teoksessa G. Navarro, D. Sankoff, & B. Zhu (Toimittajat), Annual Symposium on Combinatorial Pattern Matching (CPM 2018) [21] (Leibniz International Proceedings in Informatics (LIPIcs) ; Vuosikerta 105). Wadern: Schloss Dagstuhl Leibniz Center for Informatics. https://doi.org/10.4230/LIPIcs.CPM.2018.21
Cazaux, B. ; Rivals, E. / Superstrings with multiplicities. Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Toimittaja / Gonzalo Navarro ; David Sankoff ; Binhai Zhu. Wadern : Schloss Dagstuhl Leibniz Center for Informatics, 2018. (Leibniz International Proceedings in Informatics (LIPIcs) ).
@inproceedings{c5f68b4920744ee68b02ae4b1bf2210a,
title = "Superstrings with multiplicities",
abstract = "A superstring of a set of words P = s1, · · · , sp is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word si comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of si. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems. {\circledC} 2018 Yoshifumi Sakai; licensed under Creative Commons License CC-BY.",
keywords = "Approximation algorithms, Pattern matching, Approximation, Cyclic cover, Greedy algorithms, Overlap, Subset system, String theory, 113 Computer and information sciences",
author = "B. Cazaux and E. Rivals",
year = "2018",
doi = "10.4230/LIPIcs.CPM.2018.21",
language = "English",
series = "Leibniz International Proceedings in Informatics (LIPIcs)",
publisher = "Schloss Dagstuhl Leibniz Center for Informatics",
editor = "Navarro, {Gonzalo } and Sankoff, {David } and Binhai Zhu",
booktitle = "Annual Symposium on Combinatorial Pattern Matching (CPM 2018)",
address = "Germany",

}

Cazaux, B & Rivals, E 2018, Superstrings with multiplicities. julkaisussa G Navarro, D Sankoff & B Zhu (toim), Annual Symposium on Combinatorial Pattern Matching (CPM 2018)., 21, Leibniz International Proceedings in Informatics (LIPIcs) , Vuosikerta 105, Schloss Dagstuhl Leibniz Center for Informatics, Wadern, Annual Symposium on Combinatorial Pattern Matching, Qingdao, Kiina, 02/07/2018. https://doi.org/10.4230/LIPIcs.CPM.2018.21

Superstrings with multiplicities. / Cazaux, B.; Rivals, E.

Annual Symposium on Combinatorial Pattern Matching (CPM 2018). toim. / Gonzalo Navarro; David Sankoff; Binhai Zhu. Wadern : Schloss Dagstuhl Leibniz Center for Informatics, 2018. 21 (Leibniz International Proceedings in Informatics (LIPIcs) ; Vuosikerta 105).

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

TY - GEN

T1 - Superstrings with multiplicities

AU - Cazaux, B.

AU - Rivals, E.

PY - 2018

Y1 - 2018

N2 - A superstring of a set of words P = s1, · · · , sp is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word si comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of si. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems. © 2018 Yoshifumi Sakai; licensed under Creative Commons License CC-BY.

AB - A superstring of a set of words P = s1, · · · , sp is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word si comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of si. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems. © 2018 Yoshifumi Sakai; licensed under Creative Commons License CC-BY.

KW - Approximation algorithms

KW - Pattern matching, Approximation

KW - Cyclic cover

KW - Greedy algorithms

KW - Overlap

KW - Subset system, String theory

KW - 113 Computer and information sciences

U2 - 10.4230/LIPIcs.CPM.2018.21

DO - 10.4230/LIPIcs.CPM.2018.21

M3 - Conference contribution

T3 - Leibniz International Proceedings in Informatics (LIPIcs)

BT - Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

A2 - Navarro, Gonzalo

A2 - Sankoff, David

A2 - Zhu, Binhai

PB - Schloss Dagstuhl Leibniz Center for Informatics

CY - Wadern

ER -

Cazaux B, Rivals E. Superstrings with multiplicities. julkaisussa Navarro G, Sankoff D, Zhu B, toimittajat, Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Wadern: Schloss Dagstuhl Leibniz Center for Informatics. 2018. 21. (Leibniz International Proceedings in Informatics (LIPIcs) ). https://doi.org/10.4230/LIPIcs.CPM.2018.21