Applying the Positional Burrows–Wheeler Transform to All-Pairs Hamming distance

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

Kuvaus

Crochemore et al. gave in WABI 2017 an algorithm that from a set of input strings finds all pairs of strings that have Hamming distance at most a given threshold. The proposed algorithm first finds all long enough exact matches between the strings, and sorts these into pairs whose coordinates also match. Then the remaining pairs are verified for the Hamming distance threshold. The algorithm was shown to work in average linear time, under some constraints and assumptions.under some constraints and assumptions.

We show that one can use the Positional Burrows-Wheeler Transform (PBWT) by Durbin (Bioinformatics, 2014) to directly find all exact matches whose coordinates also match. The same structure also extends to verifying the pairs for the Hamming distance threshold. The same analysis as for the algorithm of Crochemore et al. applies.

As a side result, we show how to extend PBWT for non-binary alphabets. The new operations provided by PBWT find other applications in similar tasks as those considered here. (C) 2019 The Authors. Published by Elsevier B.V.

Alkuperäiskielienglanti
LehtiInformation Processing Letters
Vuosikerta146
Sivut17-19
Sivumäärä3
ISSN0020-0190
DOI - pysyväislinkit
TilaJulkaistu - kesäkuuta 2019
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä, vertaisarvioitu

Tieteenalat

  • 114 Fysiikka

Lainaa tätä

@article{cbc26c2cd2a844aead6c87ce22961f11,
title = "Applying the Positional Burrows–Wheeler Transform to All-Pairs Hamming distance",
abstract = "Crochemore et al. gave in WABI 2017 an algorithm that from a set of input strings finds all pairs of strings that have Hamming distance at most a given threshold. The proposed algorithm first finds all long enough exact matches between the strings, and sorts these into pairs whose coordinates also match. Then the remaining pairs are verified for the Hamming distance threshold. The algorithm was shown to work in average linear time, under some constraints and assumptions.under some constraints and assumptions.We show that one can use the Positional Burrows-Wheeler Transform (PBWT) by Durbin (Bioinformatics, 2014) to directly find all exact matches whose coordinates also match. The same structure also extends to verifying the pairs for the Hamming distance threshold. The same analysis as for the algorithm of Crochemore et al. applies.As a side result, we show how to extend PBWT for non-binary alphabets. The new operations provided by PBWT find other applications in similar tasks as those considered here. (C) 2019 The Authors. Published by Elsevier B.V.",
keywords = "Data structures, Hamming distance, Phylogenetic inference, Positional Burrows-Wheeler Transform, 114 Physical sciences",
author = "Veli M{\"a}kinen and Tuukka Norri",
year = "2019",
month = "6",
doi = "10.1016/j.ipl.2019.02.003",
language = "English",
volume = "146",
pages = "17--19",
journal = "Information Processing Letters",
issn = "0020-0190",
publisher = "Elsevier Scientific Publ. Co",

}

Applying the Positional Burrows–Wheeler Transform to All-Pairs Hamming distance. / Mäkinen, Veli; Norri, Tuukka.

julkaisussa: Information Processing Letters, Vuosikerta 146, 06.2019, s. 17-19.

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

TY - JOUR

T1 - Applying the Positional Burrows–Wheeler Transform to All-Pairs Hamming distance

AU - Mäkinen, Veli

AU - Norri, Tuukka

PY - 2019/6

Y1 - 2019/6

N2 - Crochemore et al. gave in WABI 2017 an algorithm that from a set of input strings finds all pairs of strings that have Hamming distance at most a given threshold. The proposed algorithm first finds all long enough exact matches between the strings, and sorts these into pairs whose coordinates also match. Then the remaining pairs are verified for the Hamming distance threshold. The algorithm was shown to work in average linear time, under some constraints and assumptions.under some constraints and assumptions.We show that one can use the Positional Burrows-Wheeler Transform (PBWT) by Durbin (Bioinformatics, 2014) to directly find all exact matches whose coordinates also match. The same structure also extends to verifying the pairs for the Hamming distance threshold. The same analysis as for the algorithm of Crochemore et al. applies.As a side result, we show how to extend PBWT for non-binary alphabets. The new operations provided by PBWT find other applications in similar tasks as those considered here. (C) 2019 The Authors. Published by Elsevier B.V.

AB - Crochemore et al. gave in WABI 2017 an algorithm that from a set of input strings finds all pairs of strings that have Hamming distance at most a given threshold. The proposed algorithm first finds all long enough exact matches between the strings, and sorts these into pairs whose coordinates also match. Then the remaining pairs are verified for the Hamming distance threshold. The algorithm was shown to work in average linear time, under some constraints and assumptions.under some constraints and assumptions.We show that one can use the Positional Burrows-Wheeler Transform (PBWT) by Durbin (Bioinformatics, 2014) to directly find all exact matches whose coordinates also match. The same structure also extends to verifying the pairs for the Hamming distance threshold. The same analysis as for the algorithm of Crochemore et al. applies.As a side result, we show how to extend PBWT for non-binary alphabets. The new operations provided by PBWT find other applications in similar tasks as those considered here. (C) 2019 The Authors. Published by Elsevier B.V.

KW - Data structures

KW - Hamming distance

KW - Phylogenetic inference

KW - Positional Burrows-Wheeler Transform

KW - 114 Physical sciences

U2 - 10.1016/j.ipl.2019.02.003

DO - 10.1016/j.ipl.2019.02.003

M3 - Article

VL - 146

SP - 17

EP - 19

JO - Information Processing Letters

JF - Information Processing Letters

SN - 0020-0190

ER -