Applying the Positional Burrows–Wheeler Transform to All-Pairs Hamming distance

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Crochemore et al. gave in WABI 2017 an algorithm that from a set of input strings finds all pairs of strings that have Hamming distance at most a given threshold. The proposed algorithm first finds all long enough exact matches between the strings, and sorts these into pairs whose coordinates also match. Then the remaining pairs are verified for the Hamming distance threshold. The algorithm was shown to work in average linear time, under some constraints and assumptions.under some constraints and assumptions.

We show that one can use the Positional Burrows-Wheeler Transform (PBWT) by Durbin (Bioinformatics, 2014) to directly find all exact matches whose coordinates also match. The same structure also extends to verifying the pairs for the Hamming distance threshold. The same analysis as for the algorithm of Crochemore et al. applies.

As a side result, we show how to extend PBWT for non-binary alphabets. The new operations provided by PBWT find other applications in similar tasks as those considered here. (C) 2019 The Authors. Published by Elsevier B.V.

Original languageEnglish
JournalInformation Processing Letters
Volume146
Pages (from-to)17-19
Number of pages3
ISSN0020-0190
DOIs
Publication statusPublished - Jun 2019
MoE publication typeA1 Journal article-refereed

Fields of Science

  • Data structures
  • Hamming distance
  • Phylogenetic inference
  • Positional Burrows-Wheeler Transform
  • 114 Physical sciences

Cite this

@article{cbc26c2cd2a844aead6c87ce22961f11,
title = "Applying the Positional Burrows–Wheeler Transform to All-Pairs Hamming distance",
abstract = "Crochemore et al. gave in WABI 2017 an algorithm that from a set of input strings finds all pairs of strings that have Hamming distance at most a given threshold. The proposed algorithm first finds all long enough exact matches between the strings, and sorts these into pairs whose coordinates also match. Then the remaining pairs are verified for the Hamming distance threshold. The algorithm was shown to work in average linear time, under some constraints and assumptions.under some constraints and assumptions.We show that one can use the Positional Burrows-Wheeler Transform (PBWT) by Durbin (Bioinformatics, 2014) to directly find all exact matches whose coordinates also match. The same structure also extends to verifying the pairs for the Hamming distance threshold. The same analysis as for the algorithm of Crochemore et al. applies.As a side result, we show how to extend PBWT for non-binary alphabets. The new operations provided by PBWT find other applications in similar tasks as those considered here. (C) 2019 The Authors. Published by Elsevier B.V.",
keywords = "Data structures, Hamming distance, Phylogenetic inference, Positional Burrows-Wheeler Transform, 114 Physical sciences",
author = "Veli M{\"a}kinen and Tuukka Norri",
year = "2019",
month = "6",
doi = "10.1016/j.ipl.2019.02.003",
language = "English",
volume = "146",
pages = "17--19",
journal = "Information Processing Letters",
issn = "0020-0190",
publisher = "Elsevier Scientific Publ. Co",

}

Applying the Positional Burrows–Wheeler Transform to All-Pairs Hamming distance. / Mäkinen, Veli; Norri, Tuukka.

In: Information Processing Letters, Vol. 146, 06.2019, p. 17-19.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Applying the Positional Burrows–Wheeler Transform to All-Pairs Hamming distance

AU - Mäkinen, Veli

AU - Norri, Tuukka

PY - 2019/6

Y1 - 2019/6

N2 - Crochemore et al. gave in WABI 2017 an algorithm that from a set of input strings finds all pairs of strings that have Hamming distance at most a given threshold. The proposed algorithm first finds all long enough exact matches between the strings, and sorts these into pairs whose coordinates also match. Then the remaining pairs are verified for the Hamming distance threshold. The algorithm was shown to work in average linear time, under some constraints and assumptions.under some constraints and assumptions.We show that one can use the Positional Burrows-Wheeler Transform (PBWT) by Durbin (Bioinformatics, 2014) to directly find all exact matches whose coordinates also match. The same structure also extends to verifying the pairs for the Hamming distance threshold. The same analysis as for the algorithm of Crochemore et al. applies.As a side result, we show how to extend PBWT for non-binary alphabets. The new operations provided by PBWT find other applications in similar tasks as those considered here. (C) 2019 The Authors. Published by Elsevier B.V.

AB - Crochemore et al. gave in WABI 2017 an algorithm that from a set of input strings finds all pairs of strings that have Hamming distance at most a given threshold. The proposed algorithm first finds all long enough exact matches between the strings, and sorts these into pairs whose coordinates also match. Then the remaining pairs are verified for the Hamming distance threshold. The algorithm was shown to work in average linear time, under some constraints and assumptions.under some constraints and assumptions.We show that one can use the Positional Burrows-Wheeler Transform (PBWT) by Durbin (Bioinformatics, 2014) to directly find all exact matches whose coordinates also match. The same structure also extends to verifying the pairs for the Hamming distance threshold. The same analysis as for the algorithm of Crochemore et al. applies.As a side result, we show how to extend PBWT for non-binary alphabets. The new operations provided by PBWT find other applications in similar tasks as those considered here. (C) 2019 The Authors. Published by Elsevier B.V.

KW - Data structures

KW - Hamming distance

KW - Phylogenetic inference

KW - Positional Burrows-Wheeler Transform

KW - 114 Physical sciences

U2 - 10.1016/j.ipl.2019.02.003

DO - 10.1016/j.ipl.2019.02.003

M3 - Article

VL - 146

SP - 17

EP - 19

JO - Information Processing Letters

JF - Information Processing Letters

SN - 0020-0190

ER -