Improving Illumina assemblies with Hi-C and long reads: An example with the North African dromedary

Jean P. Elbers, Mark F. Rogers, Polina L. Perelman, Anastasia A. Proskuryakova, Natalia A. Serdyukova, Warren E. Johnson, Petr Horin, Jukka Corander, David Murphy, Pamela A. Burger

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Researchers have assembled thousands of eukaryotic genomes using Illumina reads, but traditional mate-pair libraries cannot span all repetitive elements, resulting in highly fragmented assemblies. However, both chromosome conformation capture techniques, such as Hi-C and Dovetail Genomics Chicago libraries and long-read sequencing, such as Pacific Biosciences and Oxford Nanopore, help span and resolve repetitive regions and therefore improve genome assemblies. One important livestock species of arid regions that does not have a high-quality contiguous reference genome is the dromedary (Camelus dromedarius). Draft genomes exist but are highly fragmented, and a high-quality reference genome is needed to understand adaptation to desert environments and artificial selection during domestication. Dromedaries are among the last livestock species to have been domesticated, and together with wild and domestic Bactrian camels, they are the only representatives of the Camelini tribe, which highlights their evolutionary significance. Here we describe our efforts to improve the North African dromedary genome. We used Chicago and Hi-C sequencing libraries from Dovetail Genomics to resolve the order of previously assembled contigs, producing almost chromosome-level scaffolds. Remaining gaps were filled with Pacific Biosciences long reads, and then scaffolds were comparatively mapped to chromosomes. Long reads added 99.32 Mbp to the total length of the new assembly. Dovetail Chicago and Hi-C libraries increased the longest scaffold over 12-fold, from 9.71 Mbp to 124.99 Mbp and the scaffold N50 over 50-fold, from 1.48 Mbp to 75.02 Mbp. We demonstrate that Illumina de novo assemblies can be substantially upgraded by combining chromosome conformation capture and long-read sequencing.
Original languageEnglish
JournalMolecular Ecology Resources
Volume19
Issue number4
Pages (from-to)1015-1026
Number of pages12
ISSN1755-098X
DOIs
Publication statusPublished - Jul 2019
MoE publication typeA1 Journal article-refereed

Fields of Science

  • 1181 Ecology, evolutionary biology
  • 1184 Genetics, developmental biology, physiology
  • chromosome mapping
  • chromosome conformation capture
  • dromedary
  • genome assembly
  • scaffolding
  • genome annotation
  • PLANT
  • CATTLE
  • ABYSS
  • PRODUCTION TRAITS
  • ANNOTATION
  • SEQUENCE
  • ARCHITECTURE
  • GENOME-WIDE ASSOCIATION
  • RESOURCE
  • MAKER

Cite this

Elbers, J. P., Rogers, M. F., Perelman, P. L., Proskuryakova, A. A., Serdyukova, N. A., Johnson, W. E., ... Burger, P. A. (2019). Improving Illumina assemblies with Hi-C and long reads: An example with the North African dromedary. Molecular Ecology Resources, 19(4), 1015-1026. https://doi.org/10.1111/1755-0998.13020
Elbers, Jean P. ; Rogers, Mark F. ; Perelman, Polina L. ; Proskuryakova, Anastasia A. ; Serdyukova, Natalia A. ; Johnson, Warren E. ; Horin, Petr ; Corander, Jukka ; Murphy, David ; Burger, Pamela A. / Improving Illumina assemblies with Hi-C and long reads : An example with the North African dromedary. In: Molecular Ecology Resources. 2019 ; Vol. 19, No. 4. pp. 1015-1026.
@article{2bf206f12a0d4015bba626f10551c424,
title = "Improving Illumina assemblies with Hi-C and long reads: An example with the North African dromedary",
abstract = "Researchers have assembled thousands of eukaryotic genomes using Illumina reads, but traditional mate-pair libraries cannot span all repetitive elements, resulting in highly fragmented assemblies. However, both chromosome conformation capture techniques, such as Hi-C and Dovetail Genomics Chicago libraries and long-read sequencing, such as Pacific Biosciences and Oxford Nanopore, help span and resolve repetitive regions and therefore improve genome assemblies. One important livestock species of arid regions that does not have a high-quality contiguous reference genome is the dromedary (Camelus dromedarius). Draft genomes exist but are highly fragmented, and a high-quality reference genome is needed to understand adaptation to desert environments and artificial selection during domestication. Dromedaries are among the last livestock species to have been domesticated, and together with wild and domestic Bactrian camels, they are the only representatives of the Camelini tribe, which highlights their evolutionary significance. Here we describe our efforts to improve the North African dromedary genome. We used Chicago and Hi-C sequencing libraries from Dovetail Genomics to resolve the order of previously assembled contigs, producing almost chromosome-level scaffolds. Remaining gaps were filled with Pacific Biosciences long reads, and then scaffolds were comparatively mapped to chromosomes. Long reads added 99.32 Mbp to the total length of the new assembly. Dovetail Chicago and Hi-C libraries increased the longest scaffold over 12-fold, from 9.71 Mbp to 124.99 Mbp and the scaffold N50 over 50-fold, from 1.48 Mbp to 75.02 Mbp. We demonstrate that Illumina de novo assemblies can be substantially upgraded by combining chromosome conformation capture and long-read sequencing.",
keywords = "1181 Ecology, evolutionary biology, 1184 Genetics, developmental biology, physiology, chromosome mapping, chromosome conformation capture, dromedary, genome assembly, scaffolding, genome annotation, PLANT, CATTLE, ABYSS, PRODUCTION TRAITS, ANNOTATION, SEQUENCE, ARCHITECTURE, GENOME-WIDE ASSOCIATION, RESOURCE, MAKER",
author = "Elbers, {Jean P.} and Rogers, {Mark F.} and Perelman, {Polina L.} and Proskuryakova, {Anastasia A.} and Serdyukova, {Natalia A.} and Johnson, {Warren E.} and Petr Horin and Jukka Corander and David Murphy and Burger, {Pamela A.}",
year = "2019",
month = "7",
doi = "10.1111/1755-0998.13020",
language = "English",
volume = "19",
pages = "1015--1026",
journal = "Molecular Ecology Resources",
issn = "1755-098X",
publisher = "John Wiley & Sons, Ltd (10.1111)",
number = "4",

}

Elbers, JP, Rogers, MF, Perelman, PL, Proskuryakova, AA, Serdyukova, NA, Johnson, WE, Horin, P, Corander, J, Murphy, D & Burger, PA 2019, 'Improving Illumina assemblies with Hi-C and long reads: An example with the North African dromedary' Molecular Ecology Resources, vol. 19, no. 4, pp. 1015-1026. https://doi.org/10.1111/1755-0998.13020

Improving Illumina assemblies with Hi-C and long reads : An example with the North African dromedary. / Elbers, Jean P.; Rogers, Mark F.; Perelman, Polina L.; Proskuryakova, Anastasia A.; Serdyukova, Natalia A.; Johnson, Warren E.; Horin, Petr; Corander, Jukka; Murphy, David; Burger, Pamela A.

In: Molecular Ecology Resources, Vol. 19, No. 4, 07.2019, p. 1015-1026.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Improving Illumina assemblies with Hi-C and long reads

T2 - An example with the North African dromedary

AU - Elbers, Jean P.

AU - Rogers, Mark F.

AU - Perelman, Polina L.

AU - Proskuryakova, Anastasia A.

AU - Serdyukova, Natalia A.

AU - Johnson, Warren E.

AU - Horin, Petr

AU - Corander, Jukka

AU - Murphy, David

AU - Burger, Pamela A.

PY - 2019/7

Y1 - 2019/7

N2 - Researchers have assembled thousands of eukaryotic genomes using Illumina reads, but traditional mate-pair libraries cannot span all repetitive elements, resulting in highly fragmented assemblies. However, both chromosome conformation capture techniques, such as Hi-C and Dovetail Genomics Chicago libraries and long-read sequencing, such as Pacific Biosciences and Oxford Nanopore, help span and resolve repetitive regions and therefore improve genome assemblies. One important livestock species of arid regions that does not have a high-quality contiguous reference genome is the dromedary (Camelus dromedarius). Draft genomes exist but are highly fragmented, and a high-quality reference genome is needed to understand adaptation to desert environments and artificial selection during domestication. Dromedaries are among the last livestock species to have been domesticated, and together with wild and domestic Bactrian camels, they are the only representatives of the Camelini tribe, which highlights their evolutionary significance. Here we describe our efforts to improve the North African dromedary genome. We used Chicago and Hi-C sequencing libraries from Dovetail Genomics to resolve the order of previously assembled contigs, producing almost chromosome-level scaffolds. Remaining gaps were filled with Pacific Biosciences long reads, and then scaffolds were comparatively mapped to chromosomes. Long reads added 99.32 Mbp to the total length of the new assembly. Dovetail Chicago and Hi-C libraries increased the longest scaffold over 12-fold, from 9.71 Mbp to 124.99 Mbp and the scaffold N50 over 50-fold, from 1.48 Mbp to 75.02 Mbp. We demonstrate that Illumina de novo assemblies can be substantially upgraded by combining chromosome conformation capture and long-read sequencing.

AB - Researchers have assembled thousands of eukaryotic genomes using Illumina reads, but traditional mate-pair libraries cannot span all repetitive elements, resulting in highly fragmented assemblies. However, both chromosome conformation capture techniques, such as Hi-C and Dovetail Genomics Chicago libraries and long-read sequencing, such as Pacific Biosciences and Oxford Nanopore, help span and resolve repetitive regions and therefore improve genome assemblies. One important livestock species of arid regions that does not have a high-quality contiguous reference genome is the dromedary (Camelus dromedarius). Draft genomes exist but are highly fragmented, and a high-quality reference genome is needed to understand adaptation to desert environments and artificial selection during domestication. Dromedaries are among the last livestock species to have been domesticated, and together with wild and domestic Bactrian camels, they are the only representatives of the Camelini tribe, which highlights their evolutionary significance. Here we describe our efforts to improve the North African dromedary genome. We used Chicago and Hi-C sequencing libraries from Dovetail Genomics to resolve the order of previously assembled contigs, producing almost chromosome-level scaffolds. Remaining gaps were filled with Pacific Biosciences long reads, and then scaffolds were comparatively mapped to chromosomes. Long reads added 99.32 Mbp to the total length of the new assembly. Dovetail Chicago and Hi-C libraries increased the longest scaffold over 12-fold, from 9.71 Mbp to 124.99 Mbp and the scaffold N50 over 50-fold, from 1.48 Mbp to 75.02 Mbp. We demonstrate that Illumina de novo assemblies can be substantially upgraded by combining chromosome conformation capture and long-read sequencing.

KW - 1181 Ecology, evolutionary biology

KW - 1184 Genetics, developmental biology, physiology

KW - chromosome mapping

KW - chromosome conformation capture

KW - dromedary

KW - genome assembly

KW - scaffolding

KW - genome annotation

KW - PLANT

KW - CATTLE

KW - ABYSS

KW - PRODUCTION TRAITS

KW - ANNOTATION

KW - SEQUENCE

KW - ARCHITECTURE

KW - GENOME-WIDE ASSOCIATION

KW - RESOURCE

KW - MAKER

U2 - 10.1111/1755-0998.13020

DO - 10.1111/1755-0998.13020

M3 - Article

VL - 19

SP - 1015

EP - 1026

JO - Molecular Ecology Resources

JF - Molecular Ecology Resources

SN - 1755-098X

IS - 4

ER -

Elbers JP, Rogers MF, Perelman PL, Proskuryakova AA, Serdyukova NA, Johnson WE et al. Improving Illumina assemblies with Hi-C and long reads: An example with the North African dromedary. Molecular Ecology Resources. 2019 Jul;19(4):1015-1026. https://doi.org/10.1111/1755-0998.13020