Safely Filling Gaps with Partial Solutions Common to All Solutions

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

Kuvaus

Gap filling has emerged as a natural sub-problem of many de novo genome assembly projects. The gap filling problem generally asks for an s-t path in an assembly graph whose length matches the gap length estimate. Several methods have addressed it, but only few have focused on strategies for dealing with multiple gap filling solutions and for guaranteeing reliable results. Such strategies include reporting only unique solutions, or exhaustively enumerating all filling solutions and heuristically creating their consensus. Our main contribution is a new method for reliable gap filling: filling gaps with those sub-paths common to all gap filling solutions. We call these partial solutions safe, following the framework of (Tomescu and Medvedev, RECOMB 2016). We give an efficient safe algorithm running in O(dm) time and space, where d is the gap length estimate and m is the number of edges of the assembly graph. To show the benefits of this method, we implemented this algorithm for the problem of filling gaps in scaffolds. Our experimental results on bacterial and on conservative human assemblies show that, on average, our method can retrieve over 73 percent more safe and correct bases as compared to previous methods, with a similar precision.

Alkuperäiskielienglanti
LehtiIEEE/ACM Transactions on Computational Biology and Bioinformatics
Vuosikerta16
Numero2
Sivut617-626
Sivumäärä10
ISSN1545-5963
DOI - pysyväislinkit
TilaJulkaistu - 2019
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä, vertaisarvioitu

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Lainaa tätä

@article{9101e50d209e46b4a1a1620dd4bb7865,
title = "Safely Filling Gaps with Partial Solutions Common to All Solutions",
abstract = "Gap filling has emerged as a natural sub-problem of many de novo genome assembly projects. The gap filling problem generally asks for an s-t path in an assembly graph whose length matches the gap length estimate. Several methods have addressed it, but only few have focused on strategies for dealing with multiple gap filling solutions and for guaranteeing reliable results. Such strategies include reporting only unique solutions, or exhaustively enumerating all filling solutions and heuristically creating their consensus. Our main contribution is a new method for reliable gap filling: filling gaps with those sub-paths common to all gap filling solutions. We call these partial solutions safe, following the framework of (Tomescu and Medvedev, RECOMB 2016). We give an efficient safe algorithm running in O(dm) time and space, where d is the gap length estimate and m is the number of edges of the assembly graph. To show the benefits of this method, we implemented this algorithm for the problem of filling gaps in scaffolds. Our experimental results on bacterial and on conservative human assemblies show that, on average, our method can retrieve over 73 percent more safe and correct bases as compared to previous methods, with a similar precision.",
keywords = "113 Computer and information sciences",
author = "Leena Salmela and Tomescu, {Alexandru I.}",
year = "2019",
doi = "10.1109/TCBB.2017.2785831",
language = "English",
volume = "16",
pages = "617--626",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "IEEE",
number = "2",

}

Safely Filling Gaps with Partial Solutions Common to All Solutions. / Salmela, Leena; Tomescu, Alexandru I.

julkaisussa: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vuosikerta 16, Nro 2, 2019, s. 617-626.

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

TY - JOUR

T1 - Safely Filling Gaps with Partial Solutions Common to All Solutions

AU - Salmela, Leena

AU - Tomescu, Alexandru I.

PY - 2019

Y1 - 2019

N2 - Gap filling has emerged as a natural sub-problem of many de novo genome assembly projects. The gap filling problem generally asks for an s-t path in an assembly graph whose length matches the gap length estimate. Several methods have addressed it, but only few have focused on strategies for dealing with multiple gap filling solutions and for guaranteeing reliable results. Such strategies include reporting only unique solutions, or exhaustively enumerating all filling solutions and heuristically creating their consensus. Our main contribution is a new method for reliable gap filling: filling gaps with those sub-paths common to all gap filling solutions. We call these partial solutions safe, following the framework of (Tomescu and Medvedev, RECOMB 2016). We give an efficient safe algorithm running in O(dm) time and space, where d is the gap length estimate and m is the number of edges of the assembly graph. To show the benefits of this method, we implemented this algorithm for the problem of filling gaps in scaffolds. Our experimental results on bacterial and on conservative human assemblies show that, on average, our method can retrieve over 73 percent more safe and correct bases as compared to previous methods, with a similar precision.

AB - Gap filling has emerged as a natural sub-problem of many de novo genome assembly projects. The gap filling problem generally asks for an s-t path in an assembly graph whose length matches the gap length estimate. Several methods have addressed it, but only few have focused on strategies for dealing with multiple gap filling solutions and for guaranteeing reliable results. Such strategies include reporting only unique solutions, or exhaustively enumerating all filling solutions and heuristically creating their consensus. Our main contribution is a new method for reliable gap filling: filling gaps with those sub-paths common to all gap filling solutions. We call these partial solutions safe, following the framework of (Tomescu and Medvedev, RECOMB 2016). We give an efficient safe algorithm running in O(dm) time and space, where d is the gap length estimate and m is the number of edges of the assembly graph. To show the benefits of this method, we implemented this algorithm for the problem of filling gaps in scaffolds. Our experimental results on bacterial and on conservative human assemblies show that, on average, our method can retrieve over 73 percent more safe and correct bases as compared to previous methods, with a similar precision.

KW - 113 Computer and information sciences

U2 - 10.1109/TCBB.2017.2785831

DO - 10.1109/TCBB.2017.2785831

M3 - Article

VL - 16

SP - 617

EP - 626

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 2

ER -