Abstract
Phylogenetic analyses can take advantage of multiple sequence alignments as input. These alignments typically consist of homologous nucleic acid or protein sequences, and the inclusion of outlier or aberrant sequences can compromise downstream analyses. Here, I describe a program, SequenceBouncer, that uses the Shannon entropy values of alignment columns to identify outlier alignment sequences in a manner responsive to overall alignment context. I demonstrate the utility of this software using alignments of available mammalian mitochondrial genomes, bird cytochrome c oxidase-derived DNA barcodes, and COVID-19 sequences.
Original language | English |
---|---|
Publisher | bioRxiv |
DOIs | |
Publication status | Published - 25 Nov 2020 |
MoE publication type | D4 Published development or research report or study |
Fields of Science
- 1181 Ecology, evolutionary biology
- 113 Computer and information sciences