High-throughput sequencing data and the impact of plant gene annotation quality

Research output: Contribution to journalArticleScientificpeer-review

Abstract

The use of draft genomes of different species and re-sequencing of accessions and populations are now a common tool for plant biology research. The de novo assembled draft genomes make it possible to identify pivotal divergence points in the plant lineage and provide an opportunity to investigate the genomic basis and timing of biological innovations by inferring orthologs between species. Furthermore, re-sequencing facilitates the mapping and subsequent molecular characterization of causative loci for traits including plant stress tolerance or development. In both cases high quality gene annotation, the identification of protein-coding regions, gene promoters and 5’ and 3’ untranslated regions, is critical for investigation of gene function. Annotations are constantly improving but automated gene annotations still require manual curation and experimental validation. This is particularly important for genes with large introns, genes located in regions rich with transposable elements or repeats, large gene families and segmentally duplicated genes. In this opinion paper we highlight the impact of annotation quality on evolutionary analyses, genome-wide association studies and the identification of orthologous genes in plants. Furthermore, we predict that incorporating the accurate information from manual curation into databases will dramatically improve the performance of automated gene predictors.
Original languageEnglish
Article numberery43
JournalJournal of Experimental Botany
Volume70
Issue number4
Pages (from-to)1069-1076
Number of pages8
ISSN0022-0957
DOIs
Publication statusPublished - 1 Feb 2019
MoE publication typeA1 Journal article-refereed

Fields of Science

  • ARABIDOPSIS-THALIANA
  • CORE GENES
  • DISCOVERY
  • DIVERSITY
  • DUPLICATION
  • EVOLUTION
  • GENOME-WIDE ASSOCIATION
  • GWAS
  • Gene families
  • PHENOTYPES
  • PIPELINE
  • TRANSPOSABLE ELEMENTS
  • genome annotation
  • high-throughput sequencing
  • phylogeny
  • translational research
  • 1183 Plant biology, microbiology, virology

Cite this

@article{6f5961fe0b5c44f994f1b17f8742b4e0,
title = "High-throughput sequencing data and the impact of plant gene annotation quality",
abstract = "The use of draft genomes of different species and re-sequencing of accessions and populations are now a common tool for plant biology research. The de novo assembled draft genomes make it possible to identify pivotal divergence points in the plant lineage and provide an opportunity to investigate the genomic basis and timing of biological innovations by inferring orthologs between species. Furthermore, re-sequencing facilitates the mapping and subsequent molecular characterization of causative loci for traits including plant stress tolerance or development. In both cases high quality gene annotation, the identification of protein-coding regions, gene promoters and 5’ and 3’ untranslated regions, is critical for investigation of gene function. Annotations are constantly improving but automated gene annotations still require manual curation and experimental validation. This is particularly important for genes with large introns, genes located in regions rich with transposable elements or repeats, large gene families and segmentally duplicated genes. In this opinion paper we highlight the impact of annotation quality on evolutionary analyses, genome-wide association studies and the identification of orthologous genes in plants. Furthermore, we predict that incorporating the accurate information from manual curation into databases will dramatically improve the performance of automated gene predictors.",
keywords = "ARABIDOPSIS-THALIANA, CORE GENES, DISCOVERY, DIVERSITY, DUPLICATION, EVOLUTION, GENOME-WIDE ASSOCIATION, GWAS, Gene families, PHENOTYPES, PIPELINE, TRANSPOSABLE ELEMENTS, genome annotation, high-throughput sequencing, phylogeny, translational research, 1183 Plant biology, microbiology, virology",
author = "Vaattovaara, {Aleksia Fanni Maria} and Lepp{\"a}l{\"a}, {Johanna Maria} and Saloj{\"a}rvi, {Jarkko Tapani} and Wrzaczek, {Michael Alois}",
year = "2019",
month = "2",
day = "1",
doi = "10.1093/jxb/ery434",
language = "English",
volume = "70",
pages = "1069--1076",
journal = "Journal of Experimental Botany",
issn = "0022-0957",
publisher = "Oxford University Press",
number = "4",

}

High-throughput sequencing data and the impact of plant gene annotation quality. / Vaattovaara, Aleksia Fanni Maria; Leppälä, Johanna Maria; Salojärvi, Jarkko Tapani; Wrzaczek, Michael Alois.

In: Journal of Experimental Botany, Vol. 70, No. 4, ery43, 01.02.2019, p. 1069-1076.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - High-throughput sequencing data and the impact of plant gene annotation quality

AU - Vaattovaara, Aleksia Fanni Maria

AU - Leppälä, Johanna Maria

AU - Salojärvi, Jarkko Tapani

AU - Wrzaczek, Michael Alois

PY - 2019/2/1

Y1 - 2019/2/1

N2 - The use of draft genomes of different species and re-sequencing of accessions and populations are now a common tool for plant biology research. The de novo assembled draft genomes make it possible to identify pivotal divergence points in the plant lineage and provide an opportunity to investigate the genomic basis and timing of biological innovations by inferring orthologs between species. Furthermore, re-sequencing facilitates the mapping and subsequent molecular characterization of causative loci for traits including plant stress tolerance or development. In both cases high quality gene annotation, the identification of protein-coding regions, gene promoters and 5’ and 3’ untranslated regions, is critical for investigation of gene function. Annotations are constantly improving but automated gene annotations still require manual curation and experimental validation. This is particularly important for genes with large introns, genes located in regions rich with transposable elements or repeats, large gene families and segmentally duplicated genes. In this opinion paper we highlight the impact of annotation quality on evolutionary analyses, genome-wide association studies and the identification of orthologous genes in plants. Furthermore, we predict that incorporating the accurate information from manual curation into databases will dramatically improve the performance of automated gene predictors.

AB - The use of draft genomes of different species and re-sequencing of accessions and populations are now a common tool for plant biology research. The de novo assembled draft genomes make it possible to identify pivotal divergence points in the plant lineage and provide an opportunity to investigate the genomic basis and timing of biological innovations by inferring orthologs between species. Furthermore, re-sequencing facilitates the mapping and subsequent molecular characterization of causative loci for traits including plant stress tolerance or development. In both cases high quality gene annotation, the identification of protein-coding regions, gene promoters and 5’ and 3’ untranslated regions, is critical for investigation of gene function. Annotations are constantly improving but automated gene annotations still require manual curation and experimental validation. This is particularly important for genes with large introns, genes located in regions rich with transposable elements or repeats, large gene families and segmentally duplicated genes. In this opinion paper we highlight the impact of annotation quality on evolutionary analyses, genome-wide association studies and the identification of orthologous genes in plants. Furthermore, we predict that incorporating the accurate information from manual curation into databases will dramatically improve the performance of automated gene predictors.

KW - ARABIDOPSIS-THALIANA

KW - CORE GENES

KW - DISCOVERY

KW - DIVERSITY

KW - DUPLICATION

KW - EVOLUTION

KW - GENOME-WIDE ASSOCIATION

KW - GWAS

KW - Gene families

KW - PHENOTYPES

KW - PIPELINE

KW - TRANSPOSABLE ELEMENTS

KW - genome annotation

KW - high-throughput sequencing

KW - phylogeny

KW - translational research

KW - 1183 Plant biology, microbiology, virology

U2 - 10.1093/jxb/ery434

DO - 10.1093/jxb/ery434

M3 - Article

VL - 70

SP - 1069

EP - 1076

JO - Journal of Experimental Botany

JF - Journal of Experimental Botany

SN - 0022-0957

IS - 4

M1 - ery43

ER -