SequenceBouncer: A method to remove outlier entries from a multiple sequence alignment

Research output: Working paperScientific

Abstract

Phylogenetic analyses can take advantage of multiple sequence alignments as input. These alignments typically consist of homologous nucleic acid or protein sequences, and the inclusion of outlier or aberrant sequences can compromise downstream analyses. Here, I describe a program, SequenceBouncer, that uses the Shannon entropy values of alignment columns to identify outlier alignment sequences in a manner responsive to overall alignment context. I demonstrate the utility of this software using alignments of available mammalian mitochondrial genomes, bird cytochrome c oxidase-derived DNA barcodes, and COVID-19 sequences.
Original languageEnglish
PublisherbioRxiv
DOIs
Publication statusPublished - 25 Nov 2020
MoE publication typeD4 Published development or research report or study

Fields of Science

  • 1181 Ecology, evolutionary biology
  • 113 Computer and information sciences

Cite this