Applying the Positional Burrows–Wheeler Transform to All-Pairs Hamming distance

Forskningsoutput: TidskriftsbidragArtikelVetenskapligPeer review

Sammanfattning

Crochemore et al. gave in WABI 2017 an algorithm that from a set of input strings finds all pairs of strings that have Hamming distance at most a given threshold. The proposed algorithm first finds all long enough exact matches between the strings, and sorts these into pairs whose coordinates also match. Then the remaining pairs are verified for the Hamming distance threshold. The algorithm was shown to work in average linear time, under some constraints and assumptions.under some constraints and assumptions.

We show that one can use the Positional Burrows-Wheeler Transform (PBWT) by Durbin (Bioinformatics, 2014) to directly find all exact matches whose coordinates also match. The same structure also extends to verifying the pairs for the Hamming distance threshold. The same analysis as for the algorithm of Crochemore et al. applies.

As a side result, we show how to extend PBWT for non-binary alphabets. The new operations provided by PBWT find other applications in similar tasks as those considered here. (C) 2019 The Authors. Published by Elsevier B.V.

Originalspråkengelska
TidskriftInformation Processing Letters
Volym146
Sidor (från-till)17-19
Antal sidor3
ISSN0020-0190
DOI
StatusPublicerad - juni 2019
MoE-publikationstypA1 Tidskriftsartikel-refererad

Vetenskapsgrenar

  • 114 Fysik

Citera det här