Greedy Shortest Common Superstring Approximation in Compact Space

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in O(n log σ) time and bits of space, where n is the total length of the input strings in characters, and σσ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly 5n log σ bits of space and reasonable time for a real dataset that consists of DNA fragments.
Originalspråkengelska
Titel på gästpublikationString Processing and Information Retrieval : 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings
RedaktörerGabriele Fici, Marinella Sciortino, Rosssano Venturini
Antal sidor13
UtgivningsortCham
FörlagSpringer International Publishing AG
Utgivningsdatum6 sep 2017
Sidor1-13
ISBN (tryckt)978-3-319-67427-8
ISBN (elektroniskt)978-3-319-67428-5
DOI
StatusPublicerad - 6 sep 2017
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangInternational Symposium on String Processing and Information Retrieval - Palermo, Italien
Varaktighet: 26 sep 201729 sep 2017
Konferensnummer: 24

Publikationsserier

NamnLecture Notes in Computer Science
FörlagSpringer International Publishing AG
Volym10508
ISSN (tryckt)0302-9743
ISSN (elektroniskt)1161-3349

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap

Citera det här

Alanko, J., & Norri, T. (2017). Greedy Shortest Common Superstring Approximation in Compact Space. I G. Fici, M. Sciortino, & R. Venturini (Red.), String Processing and Information Retrieval: 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings (s. 1-13). (Lecture Notes in Computer Science; Vol. 10508). Cham: Springer International Publishing AG. https://doi.org/10.1007/978-3-319-67428-5_1
Alanko, Jarno ; Norri, Tuukka. / Greedy Shortest Common Superstring Approximation in Compact Space. String Processing and Information Retrieval: 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings. redaktör / Gabriele Fici ; Marinella Sciortino ; Rosssano Venturini. Cham : Springer International Publishing AG, 2017. s. 1-13 (Lecture Notes in Computer Science).
@inproceedings{6590e12b29b546c4ac9be25d1ff5e8da,
title = "Greedy Shortest Common Superstring Approximation in Compact Space",
abstract = "Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in O(n log σ) time and bits of space, where n is the total length of the input strings in characters, and σσ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly 5n log σ bits of space and reasonable time for a real dataset that consists of DNA fragments.",
keywords = "113 Computer and information sciences, BWT, shortest common superstring, greedy, approximation, compact",
author = "Jarno Alanko and Tuukka Norri",
year = "2017",
month = "9",
day = "6",
doi = "10.1007/978-3-319-67428-5_1",
language = "English",
isbn = "978-3-319-67427-8",
series = "Lecture Notes in Computer Science",
publisher = "Springer International Publishing AG",
pages = "1--13",
editor = "Gabriele Fici and Marinella Sciortino and Rosssano Venturini",
booktitle = "String Processing and Information Retrieval",
address = "Switzerland",

}

Alanko, J & Norri, T 2017, Greedy Shortest Common Superstring Approximation in Compact Space. i G Fici, M Sciortino & R Venturini (red), String Processing and Information Retrieval: 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings. Lecture Notes in Computer Science, vol. 10508, Springer International Publishing AG, Cham, s. 1-13, International Symposium on String Processing and Information Retrieval, Palermo, Italien, 26/09/2017. https://doi.org/10.1007/978-3-319-67428-5_1

Greedy Shortest Common Superstring Approximation in Compact Space. / Alanko, Jarno; Norri, Tuukka.

String Processing and Information Retrieval: 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings. red. / Gabriele Fici; Marinella Sciortino; Rosssano Venturini. Cham : Springer International Publishing AG, 2017. s. 1-13 (Lecture Notes in Computer Science; Vol. 10508).

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

TY - GEN

T1 - Greedy Shortest Common Superstring Approximation in Compact Space

AU - Alanko, Jarno

AU - Norri, Tuukka

PY - 2017/9/6

Y1 - 2017/9/6

N2 - Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in O(n log σ) time and bits of space, where n is the total length of the input strings in characters, and σσ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly 5n log σ bits of space and reasonable time for a real dataset that consists of DNA fragments.

AB - Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in O(n log σ) time and bits of space, where n is the total length of the input strings in characters, and σσ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly 5n log σ bits of space and reasonable time for a real dataset that consists of DNA fragments.

KW - 113 Computer and information sciences

KW - BWT

KW - shortest common superstring

KW - greedy

KW - approximation

KW - compact

U2 - 10.1007/978-3-319-67428-5_1

DO - 10.1007/978-3-319-67428-5_1

M3 - Conference contribution

SN - 978-3-319-67427-8

T3 - Lecture Notes in Computer Science

SP - 1

EP - 13

BT - String Processing and Information Retrieval

A2 - Fici, Gabriele

A2 - Sciortino, Marinella

A2 - Venturini, Rosssano

PB - Springer International Publishing AG

CY - Cham

ER -

Alanko J, Norri T. Greedy Shortest Common Superstring Approximation in Compact Space. I Fici G, Sciortino M, Venturini R, redaktörer, String Processing and Information Retrieval: 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings. Cham: Springer International Publishing AG. 2017. s. 1-13. (Lecture Notes in Computer Science). https://doi.org/10.1007/978-3-319-67428-5_1