Abstract
We describe indexes for searching large data sets for variable-length-gapped (VLG) patterns. VLG patterns are composed of two or more subpatterns, between each adjacent pair of which is a gap-constraint specifying upper and lower bounds on the distance allowed between sub-patterns. VLG patterns have numerous applications in computational biology (motif search), information retrieval (e.g., for language models, snippet generation, machine translation) and capture a useful subclass of the regular expressions commonly used in practice for searching source code. Our best approach provides search speeds several times faster than prior art across a broad range of patterns and texts.
Original language | English |
---|---|
Title of host publication | SOFSEM 2020: Theory and Practice of Computer Science : 46th International Conference on Current Trends in Theory and Practice of Informatics, SOFSEM 2020, Limassol, Cyprus, January 20–24, 2020, Proceedings |
Number of pages | 12 |
Publisher | Springer |
Publication date | 17 Jan 2020 |
Pages | 493-504 |
ISBN (Print) | 978-3-030-38918-5 |
ISBN (Electronic) | 978-3-030-38919-2 |
DOIs | |
Publication status | Published - 17 Jan 2020 |
MoE publication type | A4 Article in conference proceedings |
Event | SOFSEM: 46th International Conference on Current Trends in Theory and Practice of Computer Science - Limassol, Cyprus Duration: 21 Jan 2020 → 24 Jan 2020 Conference number: 46 http://cyprusconferences.org/sofsem2020/ |
Fields of Science
- 113 Computer and information sciences
- Pattern matching with gaps