Greedy Shortest Common Superstring Approximation in Compact Space

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in O(n log σ) time and bits of space, where n is the total length of the input strings in characters, and σσ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly 5n log σ bits of space and reasonable time for a real dataset that consists of DNA fragments.
Original languageEnglish
Title of host publicationString Processing and Information Retrieval : 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings
EditorsGabriele Fici, Marinella Sciortino, Rosssano Venturini
Number of pages13
Place of PublicationCham
Publisher Springer International Publishing AG
Publication date6 Sep 2017
Pages1-13
ISBN (Print)978-3-319-67427-8
ISBN (Electronic)978-3-319-67428-5
DOIs
Publication statusPublished - 6 Sep 2017
MoE publication typeA4 Article in conference proceedings
EventInternational Symposium on String Processing and Information Retrieval - Palermo, Italy
Duration: 26 Sep 201729 Sep 2017
Conference number: 24

Publication series

NameLecture Notes in Computer Science
PublisherSpringer International Publishing AG
Volume10508
ISSN (Print)0302-9743
ISSN (Electronic)1161-3349

Fields of Science

  • 113 Computer and information sciences
  • BWT
  • shortest common superstring
  • greedy
  • approximation
  • compact

Cite this

Alanko, J., & Norri, T. (2017). Greedy Shortest Common Superstring Approximation in Compact Space. In G. Fici, M. Sciortino, & R. Venturini (Eds.), String Processing and Information Retrieval: 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings (pp. 1-13). (Lecture Notes in Computer Science; Vol. 10508). Cham: Springer International Publishing AG. https://doi.org/10.1007/978-3-319-67428-5_1
Alanko, Jarno ; Norri, Tuukka. / Greedy Shortest Common Superstring Approximation in Compact Space. String Processing and Information Retrieval: 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings. editor / Gabriele Fici ; Marinella Sciortino ; Rosssano Venturini. Cham : Springer International Publishing AG, 2017. pp. 1-13 (Lecture Notes in Computer Science).
@inproceedings{6590e12b29b546c4ac9be25d1ff5e8da,
title = "Greedy Shortest Common Superstring Approximation in Compact Space",
abstract = "Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in O(n log σ) time and bits of space, where n is the total length of the input strings in characters, and σσ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly 5n log σ bits of space and reasonable time for a real dataset that consists of DNA fragments.",
keywords = "113 Computer and information sciences, BWT, shortest common superstring, greedy, approximation, compact",
author = "Jarno Alanko and Tuukka Norri",
year = "2017",
month = "9",
day = "6",
doi = "10.1007/978-3-319-67428-5_1",
language = "English",
isbn = "978-3-319-67427-8",
series = "Lecture Notes in Computer Science",
publisher = "Springer International Publishing AG",
pages = "1--13",
editor = "Gabriele Fici and Marinella Sciortino and Rosssano Venturini",
booktitle = "String Processing and Information Retrieval",
address = "Switzerland",

}

Alanko, J & Norri, T 2017, Greedy Shortest Common Superstring Approximation in Compact Space. in G Fici, M Sciortino & R Venturini (eds), String Processing and Information Retrieval: 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings. Lecture Notes in Computer Science, vol. 10508, Springer International Publishing AG, Cham, pp. 1-13, International Symposium on String Processing and Information Retrieval, Palermo, Italy, 26/09/2017. https://doi.org/10.1007/978-3-319-67428-5_1

Greedy Shortest Common Superstring Approximation in Compact Space. / Alanko, Jarno; Norri, Tuukka.

String Processing and Information Retrieval: 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings. ed. / Gabriele Fici; Marinella Sciortino; Rosssano Venturini. Cham : Springer International Publishing AG, 2017. p. 1-13 (Lecture Notes in Computer Science; Vol. 10508).

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Greedy Shortest Common Superstring Approximation in Compact Space

AU - Alanko, Jarno

AU - Norri, Tuukka

PY - 2017/9/6

Y1 - 2017/9/6

N2 - Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in O(n log σ) time and bits of space, where n is the total length of the input strings in characters, and σσ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly 5n log σ bits of space and reasonable time for a real dataset that consists of DNA fragments.

AB - Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in O(n log σ) time and bits of space, where n is the total length of the input strings in characters, and σσ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly 5n log σ bits of space and reasonable time for a real dataset that consists of DNA fragments.

KW - 113 Computer and information sciences

KW - BWT

KW - shortest common superstring

KW - greedy

KW - approximation

KW - compact

U2 - 10.1007/978-3-319-67428-5_1

DO - 10.1007/978-3-319-67428-5_1

M3 - Conference contribution

SN - 978-3-319-67427-8

T3 - Lecture Notes in Computer Science

SP - 1

EP - 13

BT - String Processing and Information Retrieval

A2 - Fici, Gabriele

A2 - Sciortino, Marinella

A2 - Venturini, Rosssano

PB - Springer International Publishing AG

CY - Cham

ER -

Alanko J, Norri T. Greedy Shortest Common Superstring Approximation in Compact Space. In Fici G, Sciortino M, Venturini R, editors, String Processing and Information Retrieval: 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings. Cham: Springer International Publishing AG. 2017. p. 1-13. (Lecture Notes in Computer Science). https://doi.org/10.1007/978-3-319-67428-5_1