Greedy Shortest Common Superstring Approximation in Compact Space

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in O(n log σ) time and bits of space, where n is the total length of the input strings in characters, and σσ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly 5n log σ bits of space and reasonable time for a real dataset that consists of DNA fragments.
Original languageEnglish
Title of host publicationString Processing and Information Retrieval : 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings
EditorsGabriele Fici, Marinella Sciortino, Rosssano Venturini
Number of pages13
Place of PublicationCham
PublisherSpringer International Publishing AG
Publication date6 Sep 2017
Pages1-13
ISBN (Print)978-3-319-67427-8
ISBN (Electronic)978-3-319-67428-5
DOIs
Publication statusPublished - 6 Sep 2017
MoE publication typeA4 Article in conference proceedings
EventInternational Symposium on String Processing and Information Retrieval - Palermo, Italy
Duration: 26 Sep 201729 Sep 2017
Conference number: 24

Publication series

NameLecture Notes in Computer Science
PublisherSpringer International Publishing AG
Volume10508
ISSN (Print)0302-9743
ISSN (Electronic)1161-3349

Fields of Science

  • 113 Computer and information sciences
  • BWT
  • shortest common superstring
  • greedy
  • approximation
  • compact

Cite this