Eulertigs: minimum plain text representation of k-mer sets without repetitions in linear time

Forskningsoutput: TidskriftsbidragArtikelVetenskapligPeer review

Sammanfattning

A fundamental operation in computational genomics is to reduce the input sequences to their constituent k-mers. For maximum performance of downstream applications it is important to store the k-mers in small space, while keeping the representation easy and efficient to use (i.e. without k-mer repetitions and in plain text). Recently, heuristics were presented to compute a near-minimum such representation. We present an algorithm to compute a minimum representation in optimal (linear) time and use it to evaluate the existing heuristics. Our algorithm first constructs the de Bruijn graph in linear time and then uses a Eulerian-cycle-based algorithm to compute the minimum representation, in time linear in the size of the output.
Originalspråkengelska
Artikelnummer5
TidskriftAlgorithms for Molecular Biology
Volym18
Nummer1
Antal sidor21
ISSN1748-7188
DOI
StatusPublicerad - 4 juli 2023
MoE-publikationstypA1 Tidskriftsartikel-refererad

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap

Citera det här