Fast in-memory XPath search using compressed indexes

Diego Arroyuelo, Francisco Claude, Sebastian Maneth, Veli Mäkinen, Gonzalo Navarro, Kim Nguyen, Jouni Sirén, Niko Välimäki

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Kuvaus

A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicates can be efficiently implemented using a compressed self-index of the document's text nodes. Most queries, however, contain some parts querying the text of the document, plus some parts querying the tree structure. It is therefore a challenge to choose an appropriate evaluation order for a given query, which optimally leverages the execution speeds of the text and tree indexes. Here the SXSI system is introduced. It stores the tree structure of an XML document using a bit array of opening and closing brackets plus a sequence of labels, and stores the text nodes of the document using a global compressed self-index. On top of these indexes sits an XPath query engine that is based on tree automata. The engine uses fast counting queries of the text index in order to dynamically determine whether to evaluate top-down or bottom-up with respect to the tree structure. The resulting system has several advantages over existing systems: (1) on pure tree queries (without text search) such as the XPathMark queries, the SXSI system performs on par or better than the fastest known systems MonetDB and Qizx, (2) on queries that use text search, SXSI outperforms the existing systems by 1-3 orders of magnitude (depending on the size of the result set), and (3) with respect to memory consumption, SXSI outperforms all other systems for counting-only queries.
Alkuperäiskielienglanti
OtsikkoICDE 2010 : 26th IEEE International Conference on Data Engineering
Sivumäärä12
KustantajaIEEE Computer Society
Julkaisupäivä2010
Sivut417-428
ISBN (painettu)978-1-4244-5446-4
DOI - pysyväislinkit
TilaJulkaistu - 2010
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaInternational Conference on Data Engineering - Long Beach, California, Yhdysvallat (USA)
Kesto: 1 maaliskuuta 20106 maaliskuuta 2010
Konferenssinumero: 26 (ICDE)

Lisätietoja


Volume:
Proceeding volume:

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Lainaa tätä

Arroyuelo, D., Claude, F., Maneth, S., Mäkinen, V., Navarro, G., Nguyen, K., ... Välimäki, N. (2010). Fast in-memory XPath search using compressed indexes. teoksessa ICDE 2010: 26th IEEE International Conference on Data Engineering (Sivut 417-428). IEEE Computer Society. https://doi.org/10.1109/ICDE.2010.5447858
Arroyuelo, Diego ; Claude, Francisco ; Maneth, Sebastian ; Mäkinen, Veli ; Navarro, Gonzalo ; Nguyen, Kim ; Sirén, Jouni ; Välimäki, Niko. / Fast in-memory XPath search using compressed indexes. ICDE 2010: 26th IEEE International Conference on Data Engineering. IEEE Computer Society, 2010. Sivut 417-428
@inproceedings{b29f153438304cc49516f69aeaf9ca3a,
title = "Fast in-memory XPath search using compressed indexes",
abstract = "A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicates can be efficiently implemented using a compressed self-index of the document's text nodes. Most queries, however, contain some parts querying the text of the document, plus some parts querying the tree structure. It is therefore a challenge to choose an appropriate evaluation order for a given query, which optimally leverages the execution speeds of the text and tree indexes. Here the SXSI system is introduced. It stores the tree structure of an XML document using a bit array of opening and closing brackets plus a sequence of labels, and stores the text nodes of the document using a global compressed self-index. On top of these indexes sits an XPath query engine that is based on tree automata. The engine uses fast counting queries of the text index in order to dynamically determine whether to evaluate top-down or bottom-up with respect to the tree structure. The resulting system has several advantages over existing systems: (1) on pure tree queries (without text search) such as the XPathMark queries, the SXSI system performs on par or better than the fastest known systems MonetDB and Qizx, (2) on queries that use text search, SXSI outperforms the existing systems by 1-3 orders of magnitude (depending on the size of the result set), and (3) with respect to memory consumption, SXSI outperforms all other systems for counting-only queries.",
keywords = "113 Computer and information sciences",
author = "Diego Arroyuelo and Francisco Claude and Sebastian Maneth and Veli M{\"a}kinen and Gonzalo Navarro and Kim Nguyen and Jouni Sir{\'e}n and Niko V{\"a}lim{\"a}ki",
note = "Volume: Proceeding volume:",
year = "2010",
doi = "10.1109/ICDE.2010.5447858",
language = "English",
isbn = "978-1-4244-5446-4",
pages = "417--428",
booktitle = "ICDE 2010",
publisher = "IEEE Computer Society",
address = "International",

}

Arroyuelo, D, Claude, F, Maneth, S, Mäkinen, V, Navarro, G, Nguyen, K, Sirén, J & Välimäki, N 2010, Fast in-memory XPath search using compressed indexes. julkaisussa ICDE 2010: 26th IEEE International Conference on Data Engineering. IEEE Computer Society, Sivut 417-428, International Conference on Data Engineering, Long Beach, California, Yhdysvallat (USA), 01/03/2010. https://doi.org/10.1109/ICDE.2010.5447858

Fast in-memory XPath search using compressed indexes. / Arroyuelo, Diego; Claude, Francisco; Maneth, Sebastian; Mäkinen, Veli; Navarro, Gonzalo; Nguyen, Kim; Sirén, Jouni; Välimäki, Niko.

ICDE 2010: 26th IEEE International Conference on Data Engineering. IEEE Computer Society, 2010. s. 417-428.

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

TY - GEN

T1 - Fast in-memory XPath search using compressed indexes

AU - Arroyuelo, Diego

AU - Claude, Francisco

AU - Maneth, Sebastian

AU - Mäkinen, Veli

AU - Navarro, Gonzalo

AU - Nguyen, Kim

AU - Sirén, Jouni

AU - Välimäki, Niko

N1 - Volume: Proceeding volume:

PY - 2010

Y1 - 2010

N2 - A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicates can be efficiently implemented using a compressed self-index of the document's text nodes. Most queries, however, contain some parts querying the text of the document, plus some parts querying the tree structure. It is therefore a challenge to choose an appropriate evaluation order for a given query, which optimally leverages the execution speeds of the text and tree indexes. Here the SXSI system is introduced. It stores the tree structure of an XML document using a bit array of opening and closing brackets plus a sequence of labels, and stores the text nodes of the document using a global compressed self-index. On top of these indexes sits an XPath query engine that is based on tree automata. The engine uses fast counting queries of the text index in order to dynamically determine whether to evaluate top-down or bottom-up with respect to the tree structure. The resulting system has several advantages over existing systems: (1) on pure tree queries (without text search) such as the XPathMark queries, the SXSI system performs on par or better than the fastest known systems MonetDB and Qizx, (2) on queries that use text search, SXSI outperforms the existing systems by 1-3 orders of magnitude (depending on the size of the result set), and (3) with respect to memory consumption, SXSI outperforms all other systems for counting-only queries.

AB - A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicates can be efficiently implemented using a compressed self-index of the document's text nodes. Most queries, however, contain some parts querying the text of the document, plus some parts querying the tree structure. It is therefore a challenge to choose an appropriate evaluation order for a given query, which optimally leverages the execution speeds of the text and tree indexes. Here the SXSI system is introduced. It stores the tree structure of an XML document using a bit array of opening and closing brackets plus a sequence of labels, and stores the text nodes of the document using a global compressed self-index. On top of these indexes sits an XPath query engine that is based on tree automata. The engine uses fast counting queries of the text index in order to dynamically determine whether to evaluate top-down or bottom-up with respect to the tree structure. The resulting system has several advantages over existing systems: (1) on pure tree queries (without text search) such as the XPathMark queries, the SXSI system performs on par or better than the fastest known systems MonetDB and Qizx, (2) on queries that use text search, SXSI outperforms the existing systems by 1-3 orders of magnitude (depending on the size of the result set), and (3) with respect to memory consumption, SXSI outperforms all other systems for counting-only queries.

KW - 113 Computer and information sciences

U2 - 10.1109/ICDE.2010.5447858

DO - 10.1109/ICDE.2010.5447858

M3 - Conference contribution

SN - 978-1-4244-5446-4

SP - 417

EP - 428

BT - ICDE 2010

PB - IEEE Computer Society

ER -

Arroyuelo D, Claude F, Maneth S, Mäkinen V, Navarro G, Nguyen K et al. Fast in-memory XPath search using compressed indexes. julkaisussa ICDE 2010: 26th IEEE International Conference on Data Engineering. IEEE Computer Society. 2010. s. 417-428 https://doi.org/10.1109/ICDE.2010.5447858