Fast Indexes for Gapped Pattern Matching

Manuel Cáceres, Simon Puglisi, Bella Zhukova

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

We describe indexes for searching large data sets for variable-length-gapped (VLG) patterns. VLG patterns are composed of two or more subpatterns, between each adjacent pair of which is a gap-constraint specifying upper and lower bounds on the distance allowed between sub-patterns. VLG patterns have numerous applications in computational biology (motif search), information retrieval (e.g., for language models, snippet generation, machine translation) and capture a useful subclass of the regular expressions commonly used in practice for searching source code. Our best approach provides search speeds several times faster than prior art across a broad range of patterns and texts.
Original languageEnglish
Title of host publicationSOFSEM 2020: Theory and Practice of Computer Science : 46th International Conference on Current Trends in Theory and Practice of Informatics, SOFSEM 2020, Limassol, Cyprus, January 20–24, 2020, Proceedings
Number of pages12
PublisherSpringer
Publication date17 Jan 2020
Pages493-504
ISBN (Print)978-3-030-38918-5
ISBN (Electronic)978-3-030-38919-2
DOIs
Publication statusPublished - 17 Jan 2020
MoE publication typeA4 Article in conference proceedings
EventSOFSEM: 46th International Conference on Current Trends in Theory and Practice of Computer Science - Limassol, Cyprus
Duration: 21 Jan 202024 Jan 2020
Conference number: 46
http://cyprusconferences.org/sofsem2020/

Fields of Science

  • 113 Computer and information sciences
  • Pattern matching with gaps

Cite this