Kermit: Linkage map guided long read assembly

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Background: With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Therefore they provide long range information that has the potential to greatly aid in de novo assembly. Previously linkage maps have been used to detect misassemblies and to manually order contigs. However, no fully automated tools exist to incorporate linkage maps in assembly but instead large amounts of manual labour is needed to order the contigs into chromosomes.

Results: We formulate the genome assembly problem in the presence of linkage maps and present the first method for guided genome assembly using linkage maps. Our method is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly.

Conclusions: We present the first method to integrate linkage maps directly into genome assembly. With a modest increase in runtime, our method improves contiguity and correctness of genome assembly.
Original languageEnglish
Article number8
JournalAlgorithms for Molecular Biology
Volume14
Issue number1
Number of pages10
ISSN1748-7188
DOIs
Publication statusPublished - 20 Mar 2019
MoE publication typeA1 Journal article-refereed

Fields of Science

  • 1182 Biochemistry, cell and molecular biology
  • 113 Computer and information sciences
  • Genome assembly
  • Linkage maps
  • Coloured overlap graph
  • GENOME
  • ALGORITHM

Cite this

@article{8bc9bc6ae01a4fdba927d3553386af8f,
title = "Kermit: Linkage map guided long read assembly",
abstract = "Background: With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Therefore they provide long range information that has the potential to greatly aid in de novo assembly. Previously linkage maps have been used to detect misassemblies and to manually order contigs. However, no fully automated tools exist to incorporate linkage maps in assembly but instead large amounts of manual labour is needed to order the contigs into chromosomes.Results: We formulate the genome assembly problem in the presence of linkage maps and present the first method for guided genome assembly using linkage maps. Our method is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly.Conclusions: We present the first method to integrate linkage maps directly into genome assembly. With a modest increase in runtime, our method improves contiguity and correctness of genome assembly.",
keywords = "1182 Biochemistry, cell and molecular biology, 113 Computer and information sciences, Genome assembly, Linkage maps, Coloured overlap graph, GENOME, ALGORITHM",
author = "Riku Walve and Pasi Rastas and Leena Salmela",
year = "2019",
month = "3",
day = "20",
doi = "10.1186/s13015-019-0143-x",
language = "English",
volume = "14",
journal = "Algorithms for Molecular Biology",
issn = "1748-7188",
publisher = "BMC",
number = "1",

}

Kermit: Linkage map guided long read assembly. / Walve, Riku; Rastas, Pasi; Salmela, Leena.

In: Algorithms for Molecular Biology, Vol. 14, No. 1, 8, 20.03.2019.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Kermit: Linkage map guided long read assembly

AU - Walve, Riku

AU - Rastas, Pasi

AU - Salmela, Leena

PY - 2019/3/20

Y1 - 2019/3/20

N2 - Background: With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Therefore they provide long range information that has the potential to greatly aid in de novo assembly. Previously linkage maps have been used to detect misassemblies and to manually order contigs. However, no fully automated tools exist to incorporate linkage maps in assembly but instead large amounts of manual labour is needed to order the contigs into chromosomes.Results: We formulate the genome assembly problem in the presence of linkage maps and present the first method for guided genome assembly using linkage maps. Our method is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly.Conclusions: We present the first method to integrate linkage maps directly into genome assembly. With a modest increase in runtime, our method improves contiguity and correctness of genome assembly.

AB - Background: With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Therefore they provide long range information that has the potential to greatly aid in de novo assembly. Previously linkage maps have been used to detect misassemblies and to manually order contigs. However, no fully automated tools exist to incorporate linkage maps in assembly but instead large amounts of manual labour is needed to order the contigs into chromosomes.Results: We formulate the genome assembly problem in the presence of linkage maps and present the first method for guided genome assembly using linkage maps. Our method is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly.Conclusions: We present the first method to integrate linkage maps directly into genome assembly. With a modest increase in runtime, our method improves contiguity and correctness of genome assembly.

KW - 1182 Biochemistry, cell and molecular biology

KW - 113 Computer and information sciences

KW - Genome assembly

KW - Linkage maps

KW - Coloured overlap graph

KW - GENOME

KW - ALGORITHM

U2 - 10.1186/s13015-019-0143-x

DO - 10.1186/s13015-019-0143-x

M3 - Article

VL - 14

JO - Algorithms for Molecular Biology

JF - Algorithms for Molecular Biology

SN - 1748-7188

IS - 1

M1 - 8

ER -