### Abstract

Crochemore et al. gave in WABI 2017 an algorithm that from a set of input strings finds all pairs of strings that have Hamming distance at most a given threshold. The proposed algorithm first finds all long enough exact matches between the strings, and sorts these into pairs whose coordinates also match. Then the remaining pairs are verified for the Hamming distance threshold. The algorithm was shown to work in average linear time, under some constraints and assumptions.under some constraints and assumptions.

We show that one can use the Positional Burrows-Wheeler Transform (PBWT) by Durbin (Bioinformatics, 2014) to directly find all exact matches whose coordinates also match. The same structure also extends to verifying the pairs for the Hamming distance threshold. The same analysis as for the algorithm of Crochemore et al. applies.

As a side result, we show how to extend PBWT for non-binary alphabets. The new operations provided by PBWT find other applications in similar tasks as those considered here. (C) 2019 The Authors. Published by Elsevier B.V.

Original language | English |
---|---|

Journal | Information Processing Letters |

Volume | 146 |

Pages (from-to) | 17-19 |

Number of pages | 3 |

ISSN | 0020-0190 |

DOIs | |

Publication status | Published - Jun 2019 |

MoE publication type | A1 Journal article-refereed |

### Fields of Science

- Data structures
- Hamming distance
- Phylogenetic inference
- Positional Burrows-Wheeler Transform
- 114 Physical sciences

### Cite this

}

**Applying the Positional Burrows–Wheeler Transform to All-Pairs Hamming distance.** / Mäkinen, Veli; Norri, Tuukka.

Research output: Contribution to journal › Article › Scientific › peer-review

TY - JOUR

T1 - Applying the Positional Burrows–Wheeler Transform to All-Pairs Hamming distance

AU - Mäkinen, Veli

AU - Norri, Tuukka

PY - 2019/6

Y1 - 2019/6

N2 - Crochemore et al. gave in WABI 2017 an algorithm that from a set of input strings finds all pairs of strings that have Hamming distance at most a given threshold. The proposed algorithm first finds all long enough exact matches between the strings, and sorts these into pairs whose coordinates also match. Then the remaining pairs are verified for the Hamming distance threshold. The algorithm was shown to work in average linear time, under some constraints and assumptions.under some constraints and assumptions.We show that one can use the Positional Burrows-Wheeler Transform (PBWT) by Durbin (Bioinformatics, 2014) to directly find all exact matches whose coordinates also match. The same structure also extends to verifying the pairs for the Hamming distance threshold. The same analysis as for the algorithm of Crochemore et al. applies.As a side result, we show how to extend PBWT for non-binary alphabets. The new operations provided by PBWT find other applications in similar tasks as those considered here. (C) 2019 The Authors. Published by Elsevier B.V.

AB - Crochemore et al. gave in WABI 2017 an algorithm that from a set of input strings finds all pairs of strings that have Hamming distance at most a given threshold. The proposed algorithm first finds all long enough exact matches between the strings, and sorts these into pairs whose coordinates also match. Then the remaining pairs are verified for the Hamming distance threshold. The algorithm was shown to work in average linear time, under some constraints and assumptions.under some constraints and assumptions.We show that one can use the Positional Burrows-Wheeler Transform (PBWT) by Durbin (Bioinformatics, 2014) to directly find all exact matches whose coordinates also match. The same structure also extends to verifying the pairs for the Hamming distance threshold. The same analysis as for the algorithm of Crochemore et al. applies.As a side result, we show how to extend PBWT for non-binary alphabets. The new operations provided by PBWT find other applications in similar tasks as those considered here. (C) 2019 The Authors. Published by Elsevier B.V.

KW - Data structures

KW - Hamming distance

KW - Phylogenetic inference

KW - Positional Burrows-Wheeler Transform

KW - 114 Physical sciences

U2 - 10.1016/j.ipl.2019.02.003

DO - 10.1016/j.ipl.2019.02.003

M3 - Article

VL - 146

SP - 17

EP - 19

JO - Information Processing Letters

JF - Information Processing Letters

SN - 0020-0190

ER -