📝 Original Info
- Title: XPath Whole Query Optimization
- ArXiv ID: 1003.4353
- Date: 2015-03-13
- Authors: Researchers from original ArXiv paper
📝 Abstract
Previous work reports about SXSI, a fast XPath engine which executes tree automata over compressed XML indexes. Here, reasons are investigated why SXSI is so fast. It is shown that tree automata can be used as a general framework for fine grained XML query optimization. We define the "relevant nodes" of a query as those nodes that a minimal automaton must touch in order to answer the query. This notion allows to skip many subtrees during execution, and, with the help of particular tree indexes, even allows to skip internal nodes of the tree. We efficiently approximate runs over relevant nodes by means of on-the-fly removal of alternation and non-determinism of (alternating) tree automata. We also introduce many implementation techniques which allows us to efficiently evaluate tree automata, even in the absence of special indexes. Through extensive experiments, we demonstrate the impact of the different optimization techniques.
💡 Deep Analysis
Deep Dive into XPath Whole Query Optimization.
Previous work reports about SXSI, a fast XPath engine which executes tree automata over compressed XML indexes. Here, reasons are investigated why SXSI is so fast. It is shown that tree automata can be used as a general framework for fine grained XML query optimization. We define the “relevant nodes” of a query as those nodes that a minimal automaton must touch in order to answer the query. This notion allows to skip many subtrees during execution, and, with the help of particular tree indexes, even allows to skip internal nodes of the tree. We efficiently approximate runs over relevant nodes by means of on-the-fly removal of alternation and non-determinism of (alternating) tree automata. We also introduce many implementation techniques which allows us to efficiently evaluate tree automata, even in the absence of special indexes. Through extensive experiments, we demonstrate the impact of the different optimization techniques.
📄 Full Content
XPath Whole Query Optimization
Sebastian Maneth
NICTA and UNSW
Sydney, Australia
sebastian.maneth@nicta.com.au
Kim Nguyen
NICTA
Sydney, Australia
kim.nguyen@nicta.com.au
ABSTRACT
Previous work reports about SXSI, a fast XPath engine which ex-
ecutes tree automata over compressed XML indexes. Here, rea-
sons are investigated why SXSI is so fast. It is shown that tree au-
tomata can be used as a general framework for fine grained XML
query optimization. We define the “relevant nodes” of a query as
those nodes that a minimal automaton must touch in order to an-
swer the query. This notion allows to skip many subtrees during
execution, and, with the help of particular tree indexes, even al-
lows to skip internal nodes of the tree. We efficiently approximate
runs over relevant nodes by means of on-the-fly removal of alterna-
tion and non-determinism of (alternating) tree automata. We also
introduce many implementation techniques which allows us to ef-
ficiently evaluate tree automata, even in the absence of special in-
dexes. Through extensive experiments, we demonstrate the impact
of the different optimization techniques.
1.
INTRODUCTION
The XPath query language plays a central role in XML process-
ing: it is deeply uprooted in almost every XML technology, starting
from query languages such as XQuery and XSLT, to access control
languages such as XACML, to JavaScript engine of popular web
browsers. Thus, efficient XPath evaluation is essential for any time-
critical XML processing. In this paper we show how tree automata
can be used as framework for fine-grained and novel types of XPath
query optimizations. The experiments with our prototype show
that, together with appropriate indexes for the XML document tree,
these optimizations give rise to unprecedented execution speed for
XPath queries, outperforming the fastest existing XPath engines.
The first breakthrough in efficient XPath execution was Koch et
al.’s seminal paper [6] (see also [7]) where it is shown that Core
XPath can be evaluated in time O(|D| · |Q|) where |D| is the size
of the document and |Q| is the size of the query. Core XPath refers
to the tree navigational fragment of XPath. Considering the time
bound of Koch’s algorithm, there are two obvious ways of reducing
this complexity in practice:
(1) reduce the number of query steps (“|Q|-optimization”) and
(2) reduce the number of nodes to consider (“|D|-optimization”).
Extreme |Q|-Optimization: A top-down deterministic tree au-
tomaton (TDTA) processes an input tree starting in its initial state,
at the root node. It then applies a unique rule which says, for a
given state and label of a node, how to process the children of that
node. A node is selected as a result, if the unique state reached by
the automaton on that node and the label of that node are elements
of a special “set of selection pairs”. After compiling a (restricted)
XPath query into such an automaton (which takes O(|Q|) time), the
run function only requires a single look-up at each node of the input
tree (plus possibly an insertion of the current node into the result
list. Since the function visits the nodes in document order and only
once, this insertion can be performed in constant time, keeps the re-
sult sorted and duplicate-free). Thus, the evaluation runs in O(|D|)
time, giving the extreme case of |Q|-optimization to |Q| = 1. Sim-
ilar automata for XML processing have been considered [12–14].
However, implementations of such automata cannot compete with
state-of-the-art XPath engines.The reasons for this deficiency are
that (1) performance depends on the speed of firstChild and nextSi-
bling operations in the XML tree data structure, (2) the automaton
needs to visit every node of D and (3) the compilation into TDTA
only works for a very restricted subset of Core XPath.
To address (1), many implementations use in-memory pointer
structures. However, this blows up the memory requirement by a
factor of 5-10 over the size of the original XML document. Hence,
such implementations can only work over small documents. We
solve this problem by using state-of-the-art succinct trees [18], a
recent development in data structures.
Solutions to problems (2) and (3) are the main subject of this
paper. We study ways to restrict the nodes of the document which
must be visited by the run function of the automaton. This gives
rise to the notion of relevant nodes, one of our key contributions.
To address (3), we work with non-deterministic alternating tree au-
tomata and carefully develop on-the-fly determinization and alter-
nation elimination algorithms. This allows to retain most the bene-
fits of deterministic automata while increasing the expressive power
to full Core XPath. Altogether, our implementation of these solu-
tions to (1) – (3) provides XPath execution speed competitive with
the best known engines [1]. While we restrict ourselves for didactic
reasons to a fragment of Core XPath, our prototype “SXSI” imple-
ments Core XPath plus text predicates [
…(Full text truncated)…
Reference
This content is AI-processed based on ArXiv data.