XPath Whole Query Optimization

Reading time: 6 minute
...

📝 Original Info

  • Title: XPath Whole Query Optimization
  • ArXiv ID: 1003.4353
  • Date: 2015-03-13
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Previous work reports about SXSI, a fast XPath engine which executes tree automata over compressed XML indexes. Here, reasons are investigated why SXSI is so fast. It is shown that tree automata can be used as a general framework for fine grained XML query optimization. We define the "relevant nodes" of a query as those nodes that a minimal automaton must touch in order to answer the query. This notion allows to skip many subtrees during execution, and, with the help of particular tree indexes, even allows to skip internal nodes of the tree. We efficiently approximate runs over relevant nodes by means of on-the-fly removal of alternation and non-determinism of (alternating) tree automata. We also introduce many implementation techniques which allows us to efficiently evaluate tree automata, even in the absence of special indexes. Through extensive experiments, we demonstrate the impact of the different optimization techniques.

💡 Deep Analysis

Deep Dive into XPath Whole Query Optimization.

Previous work reports about SXSI, a fast XPath engine which executes tree automata over compressed XML indexes. Here, reasons are investigated why SXSI is so fast. It is shown that tree automata can be used as a general framework for fine grained XML query optimization. We define the “relevant nodes” of a query as those nodes that a minimal automaton must touch in order to answer the query. This notion allows to skip many subtrees during execution, and, with the help of particular tree indexes, even allows to skip internal nodes of the tree. We efficiently approximate runs over relevant nodes by means of on-the-fly removal of alternation and non-determinism of (alternating) tree automata. We also introduce many implementation techniques which allows us to efficiently evaluate tree automata, even in the absence of special indexes. Through extensive experiments, we demonstrate the impact of the different optimization techniques.

📄 Full Content

XPath Whole Query Optimization Sebastian Maneth NICTA and UNSW Sydney, Australia sebastian.maneth@nicta.com.au Kim Nguyen NICTA Sydney, Australia kim.nguyen@nicta.com.au ABSTRACT Previous work reports about SXSI, a fast XPath engine which ex- ecutes tree automata over compressed XML indexes. Here, rea- sons are investigated why SXSI is so fast. It is shown that tree au- tomata can be used as a general framework for fine grained XML query optimization. We define the “relevant nodes” of a query as those nodes that a minimal automaton must touch in order to an- swer the query. This notion allows to skip many subtrees during execution, and, with the help of particular tree indexes, even al- lows to skip internal nodes of the tree. We efficiently approximate runs over relevant nodes by means of on-the-fly removal of alterna- tion and non-determinism of (alternating) tree automata. We also introduce many implementation techniques which allows us to ef- ficiently evaluate tree automata, even in the absence of special in- dexes. Through extensive experiments, we demonstrate the impact of the different optimization techniques. 1. INTRODUCTION The XPath query language plays a central role in XML process- ing: it is deeply uprooted in almost every XML technology, starting from query languages such as XQuery and XSLT, to access control languages such as XACML, to JavaScript engine of popular web browsers. Thus, efficient XPath evaluation is essential for any time- critical XML processing. In this paper we show how tree automata can be used as framework for fine-grained and novel types of XPath query optimizations. The experiments with our prototype show that, together with appropriate indexes for the XML document tree, these optimizations give rise to unprecedented execution speed for XPath queries, outperforming the fastest existing XPath engines. The first breakthrough in efficient XPath execution was Koch et al.’s seminal paper [6] (see also [7]) where it is shown that Core XPath can be evaluated in time O(|D| · |Q|) where |D| is the size of the document and |Q| is the size of the query. Core XPath refers to the tree navigational fragment of XPath. Considering the time bound of Koch’s algorithm, there are two obvious ways of reducing this complexity in practice: (1) reduce the number of query steps (“|Q|-optimization”) and (2) reduce the number of nodes to consider (“|D|-optimization”). Extreme |Q|-Optimization: A top-down deterministic tree au- tomaton (TDTA) processes an input tree starting in its initial state, at the root node. It then applies a unique rule which says, for a given state and label of a node, how to process the children of that node. A node is selected as a result, if the unique state reached by the automaton on that node and the label of that node are elements of a special “set of selection pairs”. After compiling a (restricted) XPath query into such an automaton (which takes O(|Q|) time), the run function only requires a single look-up at each node of the input tree (plus possibly an insertion of the current node into the result list. Since the function visits the nodes in document order and only once, this insertion can be performed in constant time, keeps the re- sult sorted and duplicate-free). Thus, the evaluation runs in O(|D|) time, giving the extreme case of |Q|-optimization to |Q| = 1. Sim- ilar automata for XML processing have been considered [12–14]. However, implementations of such automata cannot compete with state-of-the-art XPath engines.The reasons for this deficiency are that (1) performance depends on the speed of firstChild and nextSi- bling operations in the XML tree data structure, (2) the automaton needs to visit every node of D and (3) the compilation into TDTA only works for a very restricted subset of Core XPath. To address (1), many implementations use in-memory pointer structures. However, this blows up the memory requirement by a factor of 5-10 over the size of the original XML document. Hence, such implementations can only work over small documents. We solve this problem by using state-of-the-art succinct trees [18], a recent development in data structures. Solutions to problems (2) and (3) are the main subject of this paper. We study ways to restrict the nodes of the document which must be visited by the run function of the automaton. This gives rise to the notion of relevant nodes, one of our key contributions. To address (3), we work with non-deterministic alternating tree au- tomata and carefully develop on-the-fly determinization and alter- nation elimination algorithms. This allows to retain most the bene- fits of deterministic automata while increasing the expressive power to full Core XPath. Altogether, our implementation of these solu- tions to (1) – (3) provides XPath execution speed competitive with the best known engines [1]. While we restrict ourselves for didactic reasons to a fragment of Core XPath, our prototype “SXSI” imple- ments Core XPath plus text predicates [

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut