Transposition invariant string matching

Veli Mäkinen, Gonzalo Navarro, Esko Ukkonen

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

Kuvaus

Given strings A = a(1)a(2)...a(m) and B=b(1)b(2)...b(n) over an alphabet Sigma subset of U, where U is some numerical universe closed under addition and subtraction, and a distance function d(A, B) that gives the score of the best (partial) matching of A and B, the transposition invariant distance is min(t is an element of U){d(A + t, B)}, where A + t = (a(1) + t)(a(2) + t)...(a(m) + t). We study the problem of computing the transposition invariant distance for various distance (and similarity) functions d, including Hamming distance, longest common subsequence (LCS), Levenshtein distance, and their versions where the exact matching condition is replaced by an approximate one. For all these problems we give algorithms whose time complexities are close to the known upper bounds without transposition invariance, and for some we achieve these upper bounds. In particular, we show how sparse dynamic programming can be used to solve transposition invariant problems, and its connection with multidimensional range-minimum search. As a byproduct, we give improved sparse dynamic programming algorithms to compute LCS and Levenshtein distance. (c) 2004 Elsevier Inc. All rights reserved.
Alkuperäiskielienglanti
LehtiJournal of Algorithms
Vuosikerta56
Sivut124-153
Sivumäärä30
ISSN0196-6774
DOI - pysyväislinkit
TilaJulkaistu - 2005
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä, vertaisarvioitu

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Lainaa tätä

Mäkinen, Veli ; Navarro, Gonzalo ; Ukkonen, Esko. / Transposition invariant string matching. Julkaisussa: Journal of Algorithms. 2005 ; Vuosikerta 56. Sivut 124-153.
@article{cfd209532b294ed7a3b2e211b6f5087d,
title = "Transposition invariant string matching",
abstract = "Given strings A = a(1)a(2)...a(m) and B=b(1)b(2)...b(n) over an alphabet Sigma subset of U, where U is some numerical universe closed under addition and subtraction, and a distance function d(A, B) that gives the score of the best (partial) matching of A and B, the transposition invariant distance is min(t is an element of U){d(A + t, B)}, where A + t = (a(1) + t)(a(2) + t)...(a(m) + t). We study the problem of computing the transposition invariant distance for various distance (and similarity) functions d, including Hamming distance, longest common subsequence (LCS), Levenshtein distance, and their versions where the exact matching condition is replaced by an approximate one. For all these problems we give algorithms whose time complexities are close to the known upper bounds without transposition invariance, and for some we achieve these upper bounds. In particular, we show how sparse dynamic programming can be used to solve transposition invariant problems, and its connection with multidimensional range-minimum search. As a byproduct, we give improved sparse dynamic programming algorithms to compute LCS and Levenshtein distance. (c) 2004 Elsevier Inc. All rights reserved.",
keywords = "113 Computer and information sciences",
author = "Veli M{\"a}kinen and Gonzalo Navarro and Esko Ukkonen",
year = "2005",
doi = "10.1016/j.jgalgot.2004.07.008",
language = "English",
volume = "56",
pages = "124--153",
journal = "Journal of Algorithms",
issn = "0196-6774",
publisher = "Academic Press",

}

Transposition invariant string matching. / Mäkinen, Veli; Navarro, Gonzalo; Ukkonen, Esko.

julkaisussa: Journal of Algorithms, Vuosikerta 56, 2005, s. 124-153.

Tutkimustuotos: ArtikkelijulkaisuArtikkeliTieteellinenvertaisarvioitu

TY - JOUR

T1 - Transposition invariant string matching

AU - Mäkinen, Veli

AU - Navarro, Gonzalo

AU - Ukkonen, Esko

PY - 2005

Y1 - 2005

N2 - Given strings A = a(1)a(2)...a(m) and B=b(1)b(2)...b(n) over an alphabet Sigma subset of U, where U is some numerical universe closed under addition and subtraction, and a distance function d(A, B) that gives the score of the best (partial) matching of A and B, the transposition invariant distance is min(t is an element of U){d(A + t, B)}, where A + t = (a(1) + t)(a(2) + t)...(a(m) + t). We study the problem of computing the transposition invariant distance for various distance (and similarity) functions d, including Hamming distance, longest common subsequence (LCS), Levenshtein distance, and their versions where the exact matching condition is replaced by an approximate one. For all these problems we give algorithms whose time complexities are close to the known upper bounds without transposition invariance, and for some we achieve these upper bounds. In particular, we show how sparse dynamic programming can be used to solve transposition invariant problems, and its connection with multidimensional range-minimum search. As a byproduct, we give improved sparse dynamic programming algorithms to compute LCS and Levenshtein distance. (c) 2004 Elsevier Inc. All rights reserved.

AB - Given strings A = a(1)a(2)...a(m) and B=b(1)b(2)...b(n) over an alphabet Sigma subset of U, where U is some numerical universe closed under addition and subtraction, and a distance function d(A, B) that gives the score of the best (partial) matching of A and B, the transposition invariant distance is min(t is an element of U){d(A + t, B)}, where A + t = (a(1) + t)(a(2) + t)...(a(m) + t). We study the problem of computing the transposition invariant distance for various distance (and similarity) functions d, including Hamming distance, longest common subsequence (LCS), Levenshtein distance, and their versions where the exact matching condition is replaced by an approximate one. For all these problems we give algorithms whose time complexities are close to the known upper bounds without transposition invariance, and for some we achieve these upper bounds. In particular, we show how sparse dynamic programming can be used to solve transposition invariant problems, and its connection with multidimensional range-minimum search. As a byproduct, we give improved sparse dynamic programming algorithms to compute LCS and Levenshtein distance. (c) 2004 Elsevier Inc. All rights reserved.

KW - 113 Computer and information sciences

U2 - 10.1016/j.jgalgot.2004.07.008

DO - 10.1016/j.jgalgot.2004.07.008

M3 - Article

VL - 56

SP - 124

EP - 153

JO - Journal of Algorithms

JF - Journal of Algorithms

SN - 0196-6774

ER -